Hi there, I’m excited to share a python coding project I recently worked on to filter dictionary keys based on specific conditions. I needed to remove keys from a dictionary when they didn’t match our criteria in this case, keeping only keys that are exactly three words long and then filtering further based on the maximum value. Let me walk you through what was wrong with my original approach, how I corrected it, and even extended its functionality for more flexible usage.
What Was Wrong with the Original Code
My initial attempt contained several issues that prevented it from working correctly. Here’s a breakdown of what went wrong:
Iteration Problem
In the original code, I tried to iterate over a list of dictionaries like this:
i in range(1, len(b)+1):
for k,v in i.items():
The problem was that i
is an integer, not a dictionary. This meant that calling i.items()
was incorrect. I learned that I should iterate directly over the dictionaries in the list.
Indexing and Regular Expression Misuse
I attempted to use the expression k[i]
inside a regular expression function (re.findall()
), which incorrectly indexed the key string instead of processing it. Furthermore, the condition was built incorrectly due to misplaced parentheses and comparison operators. The goal was to evaluate the entire key string, not just a part of it.
Deletion During Iteration
Finally, deleting keys from a dictionary while iterating over it can lead to runtime errors or unexpected behavior (like skipping items). A better and safer approach is to build a new dictionary that contains only the keys we want to keep.
Step-by-Step Corrected Example
I refactored my code by following a clear sequence of steps for each dictionary in the list:
- Filter keys based on word count:
I kept only the keys that have exactly 3 words. - Determine the maximum value:
Among the filtered keys, I chose those with the highest value. - Construct a new dictionary:
Finally, I built a new list of dictionaries that contained only the keys meeting the criteria.
Base Function (Exact Reproduction of Desired Output)
Below is the corrected code:
re
def filter_max_three_word_keys(dicts_list):
"""
For each dictionary in dicts_list:
- Keep only the keys that are exactly 3 words long.
- Among these, find the maximum value.
- Keep only keys that have this maximum value.
"""
new_dicts = []
for d in dicts_list:
# Filter keys with exactly 3 words.
filtered = {k: v for k, v in d.items() if len(k.split()) == 3}
# If no key meets the condition, append an empty dictionary.
if not filtered:
new_dicts.append({})
continue
# Find the maximum value among the filtered keys.
max_val = max(filtered.values())
# Keep only keys with the maximum value.
top_keys = {k: v for k, v in filtered.items() if v == max_val}
new_dicts.append(top_keys)
return new_dicts
# Test input based on the blog post
b = [
{'america': 0.10640008943905088,
'delete option snapshot': 0.18889748775492732,
'done': 0.10918437741476256,
'done auto manufacturing': 0.18889748775492732,
'done auto delete': 0.18889748775492732,
'overwhelmed': 0.1714953267142263,
'overwhelmed sub': 0.18889748775492732,
'overwhelmed sub value': 0.18889748775492732},
{'delete': 0.17737631178689198,
'delete invalid': 0.2918855502796403,
'delete invalid data': 0.2918855502796403,
'invalid': 0.19409701271823834,
'invalid data': 0.2918855502796403,
'invalid data sir': 0.2918855502796403,
'nas': 0.14949544719217545,
'nas server': 0.1632884084021329,
'nas server replic': 0.2799865687396422}
]
result = filter_max_three_word_keys(b)
print("Filtered dictionaries based on keys with 3 words and maximum value:")
print(result)
Explanation of the Code
- Filtering by Word Count:
The conditionif len(k.split()) == 3
splits the key by whitespace and counts the number of words. This way, only keys like"delete option snapshot"
are kept. - Determining the Maximum Value:
I usemax(filtered.values())
to find the highest value among the filtered keys. - Rebuilding the Dictionary:
Instead of deleting keys while iterating, I construct a new dictionarytop_keys
containing only the keys with the maximum value. This technique avoids unexpected behavior during iteration.
Extended Functionality for Practice
To make my code more flexible, I extended it with additional functionality. In this version, you can:
- Choose an n-gram size:
Specify the number of words each key must have. - Select top N keys:
Instead of keeping only the keys equal to the maximum value, this function sorts the keys by value (and lexicographically if values are the same) and then retains the top N keys.
Here’s the enhanced code:
filter_top_keys(dicts_list, ngram_size=3, top_n=2):
"""
For each dictionary in dicts_list:
- Keep only keys that are exactly ngram_size words long.
- Sort the keys by value in descending order.
- In case of equal values, sort lexicographically.
- Return the top 'top_n' keys. If there are fewer than top_n keys, return them all.
Parameters:
- dicts_list: List of dictionaries.
- ngram_size: The required number of words in keys.
- top_n: The number of top keys to keep.
Returns:
- List of dictionaries with filtered keys.
"""
new_dicts = []
for d in dicts_list:
# Filter keys based on the specified ngram size.
filtered = {k: v for k, v in d.items() if len(k.split()) == ngram_size}
if not filtered:
new_dicts.append({})
continue
# Create a sorted list of (key, value) tuples.
# First, sort lexicographically (to have a consistent order for ties)
# Then, sort by value in descending order.
sorted_items = sorted(filtered.items(), key=lambda item: (item[0]))
sorted_items = sorted(sorted_items, key=lambda item: item[1], reverse=True)
# Take the top_n items.
top_items = sorted_items[:top_n]
new_dicts.append(dict(top_items))
return new_dicts
# Extended functionality test:
print("\nExtended functionality (choosing top N keys based on value):")
# For example: top 2 keys with exactly 3 words from each dictionary.
extended_result = filter_top_keys(b, ngram_size=3, top_n=2)
print(extended_result)
Explanation of the Extended Function
- Dynamic n-gram Filtering:
By setting thengram_size
parameter (default is 3), you can filter keys based on any desired number of words. - Sorting the Dictionary:
The function sorts keys lexicographically first so that when values tie, there’s a consistent order, and then it sorts by value in descending order to prioritize higher values. - Selecting the Top Keys:
The sorted list is sliced to return only the toptop_n
keys. For example, even if more than two keys meet the criteria, only the two with the highest values are returned.
Running the Code
Base Function Output
When I run the base function filter_max_three_word_keys(b)
, I get:
[
{'delete option snapshot': 0.18889748775492732,
'done auto manufacturing': 0.18889748775492732,
'done auto delete': 0.18889748775492732,
'overwhelmed sub value': 0.18889748775492732},
{'delete invalid data': 0.2918855502796403,
'invalid data sir': 0.2918855502796403}
]
Extended Function Output
For the extended version with top_n=2
, the output might look like:
[
{'delete option snapshot': 0.18889748775492732, 'done auto delete': 0.18889748775492732},
{'delete invalid data': 0.2918855502796403, 'invalid data sir': 0.2918855502796403}
]
You can adjust ngram_size
and top_n
depending on your specific needs.
Final Thought
I’ve found this exercise both challenging and rewarding. Not only did it give me an opportunity to fine-tune my understanding of dictionary manipulations in Python, but it also reinforced the importance of iterating safely and ensuring that conditions are accurately applied. Building the extended functionality taught me to write more flexible and reusable codema principle I hold dearly as a Python developer.