Grokking the Unpacking Operator

Python
Unpacking Operator
Method Chaining
Author

Ricky Macharm

Published

September 14, 2024

Introduction

Imagine for a moment that it is your spouse’s birthday and you planned to buy them a gift - a box with some selected items: jewelry, a watch, perfume, and airpods. You gleefully look at her/him smile as s/he unboxes it, carefully revealing each item one by one.

Similarly, in Python, unpacking allows us to unbox the elements of an iterable such as a list, tuple, or dictionary into individual variables.

In this article, we will get into the weeds of how python unboxes items, properly known as unpacking.

The Basics of Unpacking

Let’s start simple. Unpacking is the process of extracting individual elements from a collection like a list or a tuple and assigning them to variables. Think of it like unboxing your wife’s birthday gift. Here’s how we do that in Python:

gift_box = ['jewelry', 'watch', 'perfume', 'airpods']
item1, item2, item3, item4 = gift_box

print(item1)  # 'jewelry'
print(item2)  # 'watch'
print(item3)  # 'perfume'
print(item4)  # 'airpods'
jewelry
watch
perfume
airpods

Here, we “unbox” the list by assigning each item to a separate variable. Now, each variable contains one of the items from the gift_box.

What Happens if There Are Too Many or Too Few Items?

What happens if the gift box has more items than variables?

gift_box = ['jewelry', 'watch', 'perfume', 'airpods', 'flowers']
item1, item2, item3 = gift_box  # Error: too many values to unpack
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 2
      1 gift_box = ['jewelry', 'watch', 'perfume', 'airpods', 'flowers']
----> 2 item1, item2, item3 = gift_box  # Error: too many values to unpack

ValueError: too many values to unpack (expected 3)

We get an error! Python doesn’t know how to fit five items into just three variables. Likewise, if we had too few items, we’d also get an error:

gift_box = ['jewelry', 'watch']
item1, item2, item3 = gift_box  # Error: not enough values to unpack
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 2
      1 gift_box = ['jewelry', 'watch']
----> 2 item1, item2, item3 = gift_box  # Error: not enough values to unpack

ValueError: not enough values to unpack (expected 3, got 2)

Unpacking with the * Operator

Now, imagine you want to only unpack the first and last items from the gift box, while ignoring the rest. This is where the * operator comes in handy:

gift_box = ['jewelry', 'watch', 'perfume', 'airpods', 'flowers']
first, *middle, last = gift_box

print(first)  # 'jewelry'
print(last)   # 'flowers'
print(middle)  # ['watch', 'perfume', 'airpods']
jewelry
flowers
['watch', 'perfume', 'airpods']

The * operator captures the remaining elements and packs them into a list. You can even discard them by using _:

first, *_, last = gift_box
print(first, last)  # 'jewelry', 'flowers'
jewelry flowers

This method is incredibly useful when you only care about a few elements in a list or tuple.

Unpacking Dictionaries with **

Now let’s move on to unpacking dictionaries. Imagine the gift contains a dictionary with item names and their respective values.

gift_details = {'jewelry': 'gold', 'watch': 'rolex', 'perfume': 'Chanel'}

You can unpack dictionaries using **. This is especially useful when you want to merge dictionaries, like adding more gifts in your spouse’s box of gifts:

extra_gift = {'airpods': 'pro'}
merged_gifts = {**gift_details, **extra_gift}
print(merged_gifts)
# {'jewelry': 'gold', 'watch': 'rolex', 'perfume': 'Chanel', 'airpods': 'pro'}
{'jewelry': 'gold', 'watch': 'rolex', 'perfume': 'Chanel', 'airpods': 'pro'}

The ** operator spreads out the dictionary and allows you to combine multiple dictionaries into one.

Using *args and **kwargs in Functions

You might have seen the terms *args and **kwargs before in Python. These are used when defining functions to accept a variable number of arguments and keyword arguments.

  • *args lets you pass a variable number of positional arguments to a function.
  • **kwargs lets you pass a variable number of keyword arguments.

Let’s start with *args:

def calculate_total(*args):
    return sum(args)

print(calculate_total(10, 20, 30))  # 60
print(calculate_total(5, 15))  # 20
60
20

Here, *args captures all the positional arguments into a tuple. You can pass as many arguments as you like.

Next, let’s look at **kwargs:

def print_gift_details(**kwargs):
    for key, value in kwargs.items():
        print(f"{key}: {value}")

print_gift_details(jewelry="gold", watch="rolex", perfume="Chanel")
# jewelry: gold
# watch: rolex
# perfume: Chanel
jewelry: gold
watch: rolex
perfume: Chanel

With **kwargs, all keyword arguments are captured into a dictionary, which you can iterate over.

Combining *args and **kwargs

You can combine both in a function to accept any number of positional and keyword arguments:

def gift_summary(*args, **kwargs):
    print("Items:", args)
    print("Details:", kwargs)

gift_summary('jewelry', 'watch', jewelry="gold", watch="rolex")
# Items: ('jewelry', 'watch')
# Details: {'jewelry': 'gold', 'watch': 'rolex'}
Items: ('jewelry', 'watch')
Details: {'jewelry': 'gold', 'watch': 'rolex'}

Method Chaining

Method chaining in pandas is a technique where multiple methods are called sequentially on a DataFrame or Series in a single statement. Each method operates on the output of the previous one, making the code concise and readable. It helps avoid intermediate variables and enables efficient data processing.

let us create a dataframe from the numpy library:

import pandas as pd
import numpy as np 
# Setting up random number generator
rng = np.random.default_rng(seed=42)

# Generating random data for open, high, low, close prices
data = {
    'Open': rng.uniform(low=100, high=200, size=10),
    'High': rng.uniform(low=200, high=300, size=10),
    'Low': rng.uniform(low=50, high=100, size=10),
    'Close': rng.uniform(low=100, high=200, size=10)
}

# Creating the DataFrame
df = pd.DataFrame(data).round(4)
df1=df2=df.copy() # making copies to use for examples later. 
df
Open High Low Close
0 177.3956 237.0798 87.9044 174.4762
1 143.8878 292.6765 67.7263 196.7510
2 185.8598 264.3865 98.5349 132.5825
3 169.7368 282.2762 94.6561 137.0460
4 109.4177 244.3414 88.9192 146.9556
5 197.5622 222.7239 59.7319 118.9471
6 176.1140 255.4585 73.3361 112.9922
7 178.6064 206.3817 52.1902 147.5705
8 112.8114 282.7631 57.7145 122.6909
9 145.0386 263.1664 84.1524 166.9814

To add a new feature like calculating the moving average of the Close price in a DataFrame, you can use the rolling() function in pandas. Here’s how to calculate the moving average (for example, a 3-period moving average) of the Close price and add it as a new column to the DataFrame.

df1['SMA'] = df1['Close'].rolling(window=3).mean()
df1
Open High Low Close SMA
0 177.3956 237.0798 87.9044 174.4762 NaN
1 143.8878 292.6765 67.7263 196.7510 NaN
2 185.8598 264.3865 98.5349 132.5825 167.936567
3 169.7368 282.2762 94.6561 137.0460 155.459833
4 109.4177 244.3414 88.9192 146.9556 138.861367
5 197.5622 222.7239 59.7319 118.9471 134.316233
6 176.1140 255.4585 73.3361 112.9922 126.298300
7 178.6064 206.3817 52.1902 147.5705 126.503267
8 112.8114 282.7631 57.7145 122.6909 127.751200
9 145.0386 263.1664 84.1524 166.9814 145.747600

To achieve the same using method chaining:

df2 = (df2
.assign(SMA=lambda x: x['Close'].rolling(window=3)
.mean())
)
df2
Open High Low Close SMA
0 177.3956 237.0798 87.9044 174.4762 NaN
1 143.8878 292.6765 67.7263 196.7510 NaN
2 185.8598 264.3865 98.5349 132.5825 167.936567
3 169.7368 282.2762 94.6561 137.0460 155.459833
4 109.4177 244.3414 88.9192 146.9556 138.861367
5 197.5622 222.7239 59.7319 118.9471 134.316233
6 176.1140 255.4585 73.3361 112.9922 126.298300
7 178.6064 206.3817 52.1902 147.5705 126.503267
8 112.8114 282.7631 57.7145 122.6909 127.751200
9 145.0386 263.1664 84.1524 166.9814 145.747600

This seems to be working just fine. However, sometimes we might want to create a function that could insert multiple features at once based off of the users’ choice. An example is given below:

def add_multiple_smas(df, col, *windows):
    for window in windows:
      df[f'SMA_{window}'] = df[col].rolling(window=window).mean()
    return df

# Example usage
add_multiple_smas(df1, 'Close', 2, 3, 4)
Open High Low Close SMA SMA_2 SMA_3 SMA_4
0 177.3956 237.0798 87.9044 174.4762 NaN NaN NaN NaN
1 143.8878 292.6765 67.7263 196.7510 NaN 185.61360 NaN NaN
2 185.8598 264.3865 98.5349 132.5825 167.936567 164.66675 167.936567 NaN
3 169.7368 282.2762 94.6561 137.0460 155.459833 134.81425 155.459833 160.213925
4 109.4177 244.3414 88.9192 146.9556 138.861367 142.00080 138.861367 153.333775
5 197.5622 222.7239 59.7319 118.9471 134.316233 132.95135 134.316233 133.882800
6 176.1140 255.4585 73.3361 112.9922 126.298300 115.96965 126.298300 128.985225
7 178.6064 206.3817 52.1902 147.5705 126.503267 130.28135 126.503267 131.616350
8 112.8114 282.7631 57.7145 122.6909 127.751200 135.13070 127.751200 125.550175
9 145.0386 263.1664 84.1524 166.9814 145.747600 144.83615 145.747600 137.558750

Attempting to accomplish a similar feat using method chaining will result in an error because the .assign method expects keyword arguments. The best way is to use the unpacking operator to achieve this with the help of dictionary comprehension to loop through the different values for the SMAs.

def add_multiple_smas(df, col, *windows, **kwargs):
    
    # Use assign to add the columns in a chained fashion
    return df.assign(**{f'SMA_{window}': df[col].rolling(window=window, **kwargs).mean() for window in windows})

# Example usage with method chaining
add_multiple_smas(df2, 'Close', 2, 3, 4)
Open High Low Close SMA SMA_2 SMA_3 SMA_4
0 177.3956 237.0798 87.9044 174.4762 NaN NaN NaN NaN
1 143.8878 292.6765 67.7263 196.7510 NaN 185.61360 NaN NaN
2 185.8598 264.3865 98.5349 132.5825 167.936567 164.66675 167.936567 NaN
3 169.7368 282.2762 94.6561 137.0460 155.459833 134.81425 155.459833 160.213925
4 109.4177 244.3414 88.9192 146.9556 138.861367 142.00080 138.861367 153.333775
5 197.5622 222.7239 59.7319 118.9471 134.316233 132.95135 134.316233 133.882800
6 176.1140 255.4585 73.3361 112.9922 126.298300 115.96965 126.298300 128.985225
7 178.6064 206.3817 52.1902 147.5705 126.503267 130.28135 126.503267 131.616350
8 112.8114 282.7631 57.7145 122.6909 127.751200 135.13070 127.751200 125.550175
9 145.0386 263.1664 84.1524 166.9814 145.747600 144.83615 145.747600 137.558750

Method chaining in pandas allows for more concise, readable, and functional-style code by performing multiple transformations in a single statement without creating intermediate variables. This makes the code more compact and often easier to follow when handling complex data transformations. It can also help avoid side effects by keeping transformations within the same flow, making it easier to debug and maintain.

However, it’s also a matter of preference. Some developers prefer method chaining for its elegance and simplicity, while others prefer using intermediate variables for clarity, especially when dealing with more complex logic, as it can be easier to inspect the data at different stages of transformation. The choice depends on the coding style that the person or team finds most understandable and maintainable.

Conclusion

Unpacking is a powerful feature in Python that helps you write cleaner and more efficient code. Whether you’re unpacking lists, tuples, or dictionaries, or using *args and **kwargs in functions, this feature allows for flexible and dynamic code. Moreover, unpacking can be used with popular libraries like Pandas and NumPy to streamline data manipulation.

So, the next time you find yourself opening a gift box, remember: Python unpacking is just like unboxing—taking out each item, one by one, and making it yours!