As you may already know I really, really like LINQ. One day I’ll probably join together all my posts about this incredible featureand release pretty neat compendium/one-oh-one about this great feature. But while I’m not sitting and joining every post from this blog that have word “LINQ” in it into one, big pile, let’s talk a bit about joining and grouping collections in LINQ.
Arranging collections by some sort of key or other common value with LINQ is easy task. Just think about basic SQL joins with foreign keys, ids and one-to-one or one-to-many relationships in databases. It’s simple, basic database stuff. And similiar things are possible while using LINQ.
However we may encounter the same problem that I’ve mentioned few weeks ago in my post about Aggregate method – Joins, GroupJoins and even some overloads of GroupBy looks complicated. You probably know how they work, but if you’ve never used them before, just looking at method definitions with generic Funcs can make you think “oh crap, I’ll just use simple for loop instead”. I really hope, that after this post everything will be clear and you will see, that those methods are as simple as basic SQL join.
I know that reading about complicated, generic methos with many parameters related to each other are not as readable as I would want to. Feel free to download code samples for this post from my github.
Basic GroupBy
Let’s do a little warmup, boring paragraph about simplest grouping method in LINQ – GroupBy. It will be most basic usage of this method, and since it have some more powerfull overloads I’ll get back to them in a while – but for now simple things first. So if you already know this method, you can just jump to section dedicated for Joins, since I’ll cover some rather basic stuff in next few paragraphs.
Imagine a shopping list, something with product name and category. Something like that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
public static IEnumerable<Product> myShoppingList { get { yield return new Product("Bacon", "Grocery"); yield return new Product("Batteries", "Electronics"); yield return new Product("Juice", "Grocery"); yield return new Product("Bread", "Grocery"); yield return new Product("Painkillers", "Pharmaceuticals"); } } public class Product { public string Name { get; set; } public string Type { get; set; } public Product(string _name, string _type) { Name = _name; Type = _type; } public override string ToString() { return $"Product: {Name}. Type: {Type}"; } } |
During your shopping run (or supply run in case of some kind of apocalypse scenario) you most certainly will need to organize your list by categories to make sure you visit only necessary stores and buy all items in one visit. Perfect method for this is GroupBy. So let’s use it and print some results:
1 2 3 4 5 6 7 8 9 10 |
var groupedList = myShoppingList.GroupBy(x => x.Type); foreach (var group in groupedList) { Console.WriteLine($"Items in group {group.Key}:"); foreach (var item in group) { Console.WriteLine(item.ToString()); } } |
Result in console should look just like that:
1 2 3 4 5 6 7 8 |
Items in group Grocery: - Product: Bacon. Type: Grocery - Product: Juice. Type: Grocery - Product: Bread. Type: Grocery Items in group Electronics: - Product: Batteries. Type: Electronics Items in group Pharmaceuticals: - Product: Painkillers. Type: Pharmaceuticals |
What happened? We’ve just grouped every item by it’s Type. Variable groupedList is of type IEnumerable<IGrouping<string, product>>, you can look closer at the implementation of IGrouping in .NET here. Point is, IGrouping has a key, which could be used to access all matching values. In simpler words it is collection where every item you’ve had at beginning suddenly have a key. It’s NOT, a dictionary but can easily be allocated into one like that.
1 2 3 |
Dictionary<string, List<Product>> dictionary = groupedList.ToDictionary(g => g.Key, g => g.ToList()); Dictionary<string, int> countDictionary = groupedList.ToDictionary(g => g.Key, g => g.Count()); Dictionary<string, IOrderedEnumerable<Product>> orderedValuesDictionary = groupedList.ToDictionary(g => g.Key, g => g.OrderBy(x => x.Name)); |
Or you could just filter groups with Where and Select results:
1 2 3 4 5 |
IEnumerable<string> groceryList = groupedList.Where(group => group.Key == "Grocery") .SelectMany(group => { return group.AsEnumerable().Select(item => item.Name); }); |
Take a closer look at IGrouping (i.e. group in example above). You can access it’s Key with group.Key property or you could just use ToList, ToArray, AsEnumerable or just iterate through it, group (or rather it’s type – IGrouping) implements IEnumerable and contains every item with equal key (and yes, you can use custom Comparers in GroupBy overloads).
Let’s leave GroupBy for a while, I’ll get back to it’s overloads soon, for now lets jump into fun stuff – joining things together.
Join
Let’s change our context a bit and imagine you want to import all of your contacts from your old phone and your email client, then you want to synchronize them together into some new, fancy system. You’ll have two collections of contact data that would look like that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
public static IEnumerable<PhoneContactItem> phoneBook { get { yield return new PhoneContactItem("Rafal", "555 555"); yield return new PhoneContactItem("John", "555 666"); yield return new PhoneContactItem("Bob", "555 777"); yield return new PhoneContactItem("Jim", "555 888"); } } public static IEnumerable<MailContactItem> mailBook { get { yield return new MailContactItem("Rafal", "rafal@(thisblogdomain).net"); yield return new MailContactItem("John", "john@somedomain.com"); yield return new MailContactItem("bob", "bob@somedomain.com"); yield return new MailContactItem("jim", "jim@somedomain.com"); } } public class PhoneContactItem { public string Name { get; set; } public string PhoneNumber { get; set; } public PhoneContactItem(string _name, string _number) { Name = _name; PhoneNumber = _number; } } public class MailContactItem { public string Name { get; set; } public string Email { get; set; } public MailContactItem(string _name, string _email) { Name = _name; Email = _email; } } public class AddressBookItem { public string Name { get; set; } public string Email { get; set; } public string PhoneNumber { get; set; } public AddressBookItem(string _name, string _email, string _number) { Name = _name; Email = _email; PhoneNumber = _number; } public override string ToString() { return $"Name: {Name}. Email: {Email}. PhoneNumber: {PhoneNumber}"; } } |
We have our simple model, so let’s try to synchronize our data. Only things that we can use as keys are Name properties, and while in real world it wouldn’t be best idea (how many Johns you know?), for sake of this example let’s agree Name will be our key and we will join two collections by this property.
Since two collections contains two different types of objects we can’t use GroupBy (well, not directly, we could make two dictionaries with names as keys and access data this way, but I would not recommend this solution). If this were SQL tables everything would be much simpler right? Simple as that:
1 2 3 4 |
SELECT phoneBookItem.[Name], mailBookItem.[Email], phoneBookItem.[PhoneNumber] FROM [dbo].[phoneBook] phoneBookItem JOIN [dbo].[mailBook] mailBookItem ON mailBookItem.[Name] = phoneBookItem.[Name] |
Sadly our data isn’t SQL but you may remember LINQ have query syntax which very closely ressembles SQL queries. Let’s stick to that for a while because you can do Joins in LINQ and they’re much simpler and easier to understand in query syntax. And to join our collections and project them to AddressBookItem and print our results you could use this code.
1 2 3 4 5 6 7 |
var querySyntaxAddressBook = from phoneBookItem in phoneBook join mailBookItem in mailBook on phoneBookItem.Name equals mailBookItem.Name select new AddressBookItem(phoneBookItem.Name, mailBookItem.Email, phoneBookItem.PhoneNumber); foreach (var item in caseInsensitiveAddressBook) Console.WriteLine(item.ToString()); |
Our console should have those printed now:
1 2 3 4 |
Name: Rafal. Email: rafal@(thisblogdomain).net. PhoneNumber: 555 555 Name: John. Email: john@somedomain.com. PhoneNumber: 555 666 Name: Bob. Email: bob@somedomain.com. PhoneNumber: 555 777 Name: Jim. Email: jim@somedomain.com. PhoneNumber: 555 888 |
So everything is great. Well, almost everything. I’ll not explain query syntax from above example right now because while it have it’s uses I don’t really like it. Instead let’s write exact same query but in normal, chained LINQ. I’ll use named method parameters as well as more explicit naming convention in lambdas in further examples, because explaining something is much easier this way. Consider using those in your code for added clarity in more complicated events.
1 2 3 4 5 |
var addressBook = phoneBook.Join( inner: mailBook, outerKeySelector: phoneBookItem => phoneBookItem.Name, innerKeySelector: mailBookItem => mailBookItem.Name, resultSelector: (phoneBookItem, mailBookItem) => new AddressBookItem(phoneBookItem.Name, mailBookItem.Email, phoneBookItem.PhoneNumber)); |
We passed our two collections to our Join method. First one is phoneBook and is being extended by LINQ method (this parameter is named “outer” ). In parameters we pass our second collection as “inner” parameter. Then we pass two delegates in succession – outer and inner key selectors, those are our keys and if they’re equal our items will be joined together. Last parameter is “resultSelector”, delegate with two parameters which are two joined items, in succession – outer and inner one.
Above example is not so complicated, but is it valid? Let’s see printed results.
1 2 |
Name: Rafal. Email: rafal@(thisblogdomain).net. PhoneNumber: 555 555 Name: John. Email: john@somedomain.com. PhoneNumber: 555 666 |
Ooops! Something went wrong and I’ve forgotten about case sensitivity. My two imported contact lists have same names but other cases. Please take a note, that query syntax didn’t noticed case difference and items that were not joined were just ignored without throwing any exceptions. Now let’s use Join overload with “comparer” parameter and this should do the trick.
1 2 3 4 5 6 |
var caseInsensitiveAddressBook = phoneBook.Join( inner: mailBook, outerKeySelector: phoneBookItem => phoneBookItem.Name, innerKeySelector: mailBookItem => mailBookItem.Name, resultSelector: (phoneBookItem, mailBookItem) => new AddressBookItem(phoneBookItem.Name, mailBookItem.Email, phoneBookItem.PhoneNumber), comparer: StringComparer.InvariantCultureIgnoreCase); |
I don’t really want to paste console results but believe me, it worked like a charm, everything is fine. Our two lists were successfully Joined and lived happily ever after. Well… until you reminded yourself that you had more complex contact list with email somewhere and you want to synchronize it too.
GroupJoin
For GroupJoin we’ll use some of data from previus examples and some new ones. Our new contact items and a bit more complex model looks like that.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
public static IEnumerable<MailContactItem> complexMailBook { get { yield return new MailContactItem("Rafal", "rafal@(thisblogdomain).net"); yield return new MailContactItem("John", "john@somedomain.com"); yield return new MailContactItem("bob", "bob@somedomain.com"); yield return new MailContactItem("jim", "jim@somedomain.com"); yield return new MailContactItem("John", "john@otherdomain.com"); yield return new MailContactItem("bob", "bob@otherdomain.com"); yield return new MailContactItem("jim", "jim@otherdomain.com"); yield return new MailContactItem("John", "john@sampledomain.com"); yield return new MailContactItem("bob", "bob@sampledomain.com"); yield return new MailContactItem("jim", "jim@sampledomain.com"); } } public class ComplexAddressBookItem { public string Name { get; set; } public List<string> Emails { get; set; } public string PhoneNumber { get; set; } public ComplexAddressBookItem(string _name, string _number, IEnumerable<string> _emails) { Name = _name; PhoneNumber = _number; Emails = _emails.ToList(); } public override string ToString() { return Emails.Aggregate($"Name: {Name}. PhoneNumber: {PhoneNumber}. Emails:", (accumulator, email) => $"{accumulator}{Environment.NewLine}-{email}"); } } |
Let’s try to use Join, just as in previous examples. After all, it shouldn’t be complicated.
1 2 3 4 5 6 7 8 9 |
var addressBook = phoneBook.Join( inner: complexMailBook, outerKeySelector: phoneBookItem => phoneBookItem.Name, innerKeySelector: complexMailBookItem => complexMailBookItem.Name, resultSelector: (phoneBookItem, complexMailBookItem) => new ComplexAddressBookItem(phoneBookItem.Name, phoneBookItem.PhoneNumber, new string[] { complexMailBookItem.Email }), /*No way to put multiple emails here :(*/ comparer: StringComparer.InvariantCultureIgnoreCase ); |
And what happened? When we’ll print results to console we’ll see this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
Name: Rafal. PhoneNumber: 555 555. Emails: -rafal@(thisblogdomain).net Name: John. PhoneNumber: 555 666. Emails: -john@somedomain.com Name: John. PhoneNumber: 555 666. Emails: -john@otherdomain.com Name: John. PhoneNumber: 555 666. Emails: -john@sampledomain.com Name: Bob. PhoneNumber: 555 777. Emails: -bob@somedomain.com Name: Bob. PhoneNumber: 555 777. Emails: -bob@otherdomain.com Name: Bob. PhoneNumber: 555 777. Emails: -bob@sampledomain.com Name: Jim. PhoneNumber: 555 888. Emails: -jim@somedomain.com Name: Jim. PhoneNumber: 555 888. Emails: -jim@otherdomain.com Name: Jim. PhoneNumber: 555 888. Emails: -jim@sampledomain.com |
It’s not exactly what we were trying to achieve. Reason for that is simple – Join is equivalent of SQL Inner Join and as such should be used for joining single results. Our new complexMailBook item requires something that behaves like SQL Outer Join and in LINQ this behavior is represented by GroupJoin method that should be used for matching single objects to collection of other objects (just like in 1:* database relationship). So let’s use right tool for the job now.
1 2 3 4 5 6 7 8 |
var complexAddressBook = phoneBook.GroupJoin( inner: complexMailBook, outerKeySelector: phoneBookItem => phoneBookItem.Name, innerKeySelector: mailBookItem => mailBookItem.Name, resultSelector: (phoneBookItem/*type of: PhoneContactItem*/, mailBookItems/*type of: IEnumerable<MailContactItem>*/) => new ComplexAddressBookItem(phoneBookItem.Name, phoneBookItem.PhoneNumber, mailBookItems.Select(x => x.Email)), comparer: StringComparer.InvariantCultureIgnoreCase ); |
As you can see GroupJoin looks almost the same as Join, main difference is second parameter in resultSelector lambda, instead of single object it is collection of matching objects, aparts from that there are no important differences in invoking GroupJoin. And this allowed us to get results that we’ve wanted:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Name: Rafal. PhoneNumber: 555 555. Emails: -rafal@(thisblogdomain).net Name: John. PhoneNumber: 555 666. Emails: -john@somedomain.com -john@otherdomain.com -john@sampledomain.com Name: Bob. PhoneNumber: 555 777. Emails: -bob@somedomain.com -bob@otherdomain.com -bob@sampledomain.com Name: Jim. PhoneNumber: 555 888. Emails: -jim@somedomain.com -jim@otherdomain.com -jim@sampledomain.com |
I hope LINQ Join and GroupJoin don’t hide any secrets from you now and you understand them clearly now. As we started with little warmup with GroupBy method let’s move to some cooldown with some of it’s more fun (less boring) overloads.
Advanced GroupBy
GroupBy have 8 overloads (if you count only IEnumerable extensions), but don’t worry, they can be matched (Joined 😉 ) in pairs where one have comparer parameter and one does not. Since we’ve already started with basic GroupBy they are only 3 left. Let’s use our old collections and models for this sections.
This one is simple and it introduces element selector parameter. It allows us to pass selector delegate that will indicate element which will be stored in our IGrouping or use method that will return desired value. Basic example for this is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
var elementSelectorGroup = complexMailBook.GroupBy( keySelector: x => x.Name, elementSelector: x => x.Email, comparer: StringComparer.InvariantCultureIgnoreCase); foreach (var group in elementSelectorGroup) { Console.WriteLine($"Items in group {group.Key}:"); foreach (var item in group) { Console.WriteLine(item.ToString()); } } |
And in console we’ll get those results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Items in group Rafal: - rafal@(thisblogdomain).net Items in group John: - john@somedomain.com - john@otherdomain.com - john@sampledomain.com Items in group bob: - bob@somedomain.com - bob@otherdomain.com - bob@sampledomain.com Items in group jim: - jim@somedomain.com - jim@otherdomain.com - jim@sampledomain.com |
Next GroupBy overload introduces result selector which allows us to project each of our groups to new, chosen objects. There are two parameters for this selector and those are: key and grouped collection (type of IEnumerable). This delegate can be really fun with Aggregate method to chose “best” object in group. But in our simple context it could be used like that.
1 2 3 4 5 6 7 8 9 10 |
var resultSelectorGroupByResults = complexMailBook.GroupBy( keySelector: x => x.Name, resultSelector: (key, mailContactItemsGroup) => new ComplexAddressBookItem(key, "Empty number", mailContactItemsGroup.Select(mailContactItem => mailContactItem.Email)), comparer: StringComparer.InvariantCultureIgnoreCase ); foreach (var item in resultSelectorGroupByResults) { Console.WriteLine(item.ToString()); } |
Which will print this in our console:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Name: Rafal. PhoneNumber: Empty number. Emails: -rafal@(thisblogdomain).net Name: John. PhoneNumber: Empty number. Emails: -john@somedomain.com -john@otherdomain.com -john@sampledomain.com Name: bob. PhoneNumber: Empty number. Emails: -bob@somedomain.com -bob@otherdomain.com -bob@sampledomain.com Name: jim. PhoneNumber: Empty number. Emails: -jim@somedomain.com -jim@otherdomain.com -jim@sampledomain.com |
We could achieve exactly same result with last GroupBy overload that takes both result and element selector, and since we need to extract all emails from our groups we should use this:
1 2 3 4 5 6 |
var resultAndElementSelectorGroupBy = complexMailBook.GroupBy( keySelector: x => x.Name, elementSelector: x => x.Email, resultSelector: (key, emailCollection) => new ComplexAddressBookItem(key, "Empty number", emailCollection), comparer: StringComparer.InvariantCultureIgnoreCase ); |
Summary
Joins are fun and pretty easy if you take a while and try to understand how they work, and how their parameters relate to each other. Joins and GroupJoins are not commonly used methods in LINQ but they could be useful and in right place they can be quite powerfull to. They have also one big, great even, virtue – they can be used with LINQ to Entities (excluding complex projections and custom comparers), Parallel LINQ and/or others, and if used correctly they can generate i.e. better SQL query than clumsy LINQ chains assembled by someone who knows just Where, Select and ToList LINQ methods (send him or her link to my blog in that case 😉 ).
I hope you have learned something new and this long post made you a litle bit smarter. If you don’t want to miss my next posts you can follow my Facebook fanpage or Twitter. This way you’ll never miss my new posts. See you soon!