Python set — a powerful collection every developer should use in 2022

What are Python sets? What is the difference between Python sets and Python lists? When to use Python sets and when to use Python lists?

Python set — a powerful collection every developer should use in 2022

INTRO

There are three basic python collections: list, dictionary and set. While list and dictionary are widely used in almost every piece of existing python code, set sometimes feels a bit underrated, though it shouldn't! Python set is an incredibly useful collection that allows you to write very efficient and beautiful code thanks to its special properties and functionalities. If you master your knowledge in sets, it will significantly improve your code and shorten development time. So, let's deep dive into python sets without further ado!

Feel free to watch my video on python sets with code examples for better understanding.

Let's start from a motivating task. How to get unique values from a list that might contain duplicates?

Well, using python set you can do it with a single line of code! Just create a set from the given list:

sets_img_1.png

Note that now our unique_names set does not contain duplicates. So what happened here? You just saw in action one of two main properties of set:

Set elements are unique.

This leads us to a short discussion about two main set properties.


Set Properties

Set implements two main properties that significantly distinguishes it from list:

  • Set elements are unique
  • Set elements are unordered.

At the first glance it might seem weird - why would someone use a collection that does not support order and does not allow duplicates when we have python list that does not have these limitations?

The answer is that these are actually not limitations, but important properties that save us from potential bugs and allow very special functionality.

Let me show you a couple of examples.

Example 1. Let's say you want to store all the cities that have ever hosted Olympic games 🏊 💃 🎿. It makes perfect sense to store these cities in set rather than in list.

  • We want to make sure that even if some cities hosted olympics more than once since 1986, it will only appear once in our collection.
  • There is no sense in ordering these cities, it's not that some cities are better than others 😅

Example 2. We want to store all the fruits that grow in Israel 🍊 🍋 🍏. The same principles apply here:

  • We don't want any duplicates in our fruits collection.
  • No special order should be presented here, it simply does not exist.

Now let's see the implementation of uniqueness and absence of order properties in action:


Uniqueness

In the following code snippet we have a list of rainy months in Israel from the last 3 years. Since there are some winter months when Israel usually gets some water from the sky, there are duplicates in the rainy_months_list.

If we want to get a unique set of rainfall months in Israel, the only thing we need to do is create a set and initialize it from rainy_months_list.

sets_img_2.png

In addition to the fact that set automatically removes all the duplicates from the collection it has been initialized with, set also maintains the uniqueness of its elements and saves us from potential mistakes. In the following code snippet you see an attempt to add duplicate element to the set. After this attempt the set remains unchanged.

sets_img_3.png

After we saw set behavior that maintains its uniqueness property, let's move forward to the second set property - its elements are unordered.


Absence of order

Since set elements don't support any special ordering between them, it won't make sense to try and access an element at specific index in set. Hence, an attempt to do so will raise an exception:

sets_img_4.png

But, of course, we always can iterate set elements just like we can iterate over any collection, though the order of elements returned during iteration is not necessarily will be preserved, and can potentially be changed:

sets_img_5.png


Set special functionality

After we saw set properties, let's take a look at the powerful functionality python set implements:

  1. Intersection
  2. Union
  3. Difference

If you are familiar with Set Theory, these are the exact actions defined there. If you never heard about Set Theory, this diagram will help you:

sets_img_6.png

Now let's look at these operations in action with code examples. First of all, we'll create 3 sets to work with:

sets_img_7.png

  • weekdays contains all the 7 existing weekdays
  • sport_days describes days in which I do my workouts
  • lecture_days are weekdays when I teach my students

We are going to use set operations to get some insights from these three sets.


Intersection

What are the days in which I give lectures in the morning AND do my workouts in the evening?

To answer this question, we need to get elements that appear both in sport_days **and lecture_days, i.e, we need to get an intersection *of these two sets. Happily, we can do it with just one line of code using intersection *method:

sets_img_8.png


Union

On which days do I have something on my schedule, i.e, what are my busy days?

Well, these are the days that appear either in sport_days or in lecture_days, which is exactly the definition of union. Let's store these days as busy_days. Happily, union method is also supported in set:

sets_img_9.png


Difference I'm planning a short one-day hike with my friends. What are those days when I am totally free and have nothing on my schedule?

Probably by now you already see that these free days are exactly the difference between weekdays and busy_days. In other words, we are looking for days that appear in weekdays, but don't appear in busy_days:

sets_img_10.png

Epilog

Congrats! You added a great tool into your Python toolbox!

Now it's time for you to apply a knowledge you just acquired. Consider your latest python projects. Was there any data that had set properties, but have been implemented using lists or other collections? If so, try to rewrite your code to use set whenever is suitable.


You can find all the code presented here in my github

Thanks for reading!