Would it be considered "pythonic" to use a nested defaultdict where bottom level is defaulted to 0 for counting?

I am building something to sort and add values from an API response. I ended up going with an interesting structure, and I just want to make sure there's nothing inherently wrong with it.

from collections import defaultdict # Helps create a unique nested default dict object # for code readability def dict_counter():     return defaultdict(lambda: 0) # Creates the nested defaultdict object ad_data = defaultdict(dict_counter) # Sorts each instance into its channel, and # adds the dict values incrimentally for ad in example:        # Collects channel and metrics     channel = ad['ad_group']['type_']     metrics = dict(         impressions= int(ad['metrics']['impressions']),         clicks     = int(ad['metrics']['clicks']),         cost       = int(ad['metrics']['cost_micros'])     )          # Adds the variables     ad_data[channel]['impressions'] += metrics['impressions']     ad_data[channel]['clicks'] += metrics['clicks']     ad_data[channel]['cost'] += metrics['cost']

The output is as desired. Again, I just want to make sure I'm not reinventing the wheel or doing something really inefficient here.

defaultdict(<function __main__.dict_counter()>,             {'DISPLAY_STANDARD': defaultdict(<function __main__.dict_counter.<locals>.<lambda>()>,                          {'impressions': 14, 'clicks': 4, 'cost': 9}),              'SEARCH_STANDARD': defaultdict(<function __main__.dict_counter.<locals>.<lambda>()>,                          {'impressions': 6, 'clicks': 2, 'cost': 4})})

Here's what my input data would look like:

example = [     {         'campaign':          {             'resource_name': 'customers/12345/campaigns/12345',             'status': 'ENABLED',             'name': 'test_campaign_2'         },         'ad_group': {             'resource_name': 'customers/12345/adGroups/12345',             'type_': 'DISPLAY_STANDARD'},         'metrics': {             'clicks': '1', 'cost_micros': '3', 'impressions': '5'         },         'ad_group_ad': {             'resource_name': 'customers/12345/adGroupAds/12345~12345',             'ad': {                 'resource_name': 'customers/12345/ads/12345'             }         }     },     {         'campaign':          {             'resource_name': 'customers/12345/campaigns/12345',             'status': 'ENABLED',             'name': 'test_campaign_2'         },         'ad_group': {             'resource_name': 'customers/12345/adGroups/12345',             'type_': 'SEARCH_STANDARD'},         'metrics': {             'clicks': '2', 'cost_micros': '4', 'impressions': '6'         },         'ad_group_ad': {             'resource_name': 'customers/12345/adGroupAds/12345~12345',             'ad': {                 'resource_name': 'customers/12345/ads/12345'             }         }     },     {         'campaign':          {             'resource_name': 'customers/12345/campaigns/12345',             'status': 'ENABLED',             'name': 'test_campaign_2'         },         'ad_group': {             'resource_name': 'customers/12345/adGroups/12345',             'type_': 'DISPLAY_STANDARD'},         'metrics': {             'clicks': '3', 'cost_micros': '6', 'impressions': '9'         },         'ad_group_ad': {             'resource_name': 'customers/12345/adGroupAds/12345~12345',             'ad': {                 'resource_name': 'customers/12345/ads/12345'             }         }     } ]

Thanks!

2 comments.

gog

October 10th, 2022 at 10:04 pm

I think you overthought this one a bit. Consider this simple function that sums two dicts:
def add_dicts(a, b): return { k: int(a.get(k, 0)) + int(b.get(k, 0)) for k in a | b }
Using this func, the main loop gets trivial:
stats = {} for obj in example: t = obj['ad_group']['type_'] stats[t] = add_dicts(stats.get(t, {}), obj['metrics'])
That's it. No defaultdicts needed.

Samwise

October 10th, 2022 at 09:41 pm

There's nothing wrong with the code you have, but the code for copying the values from one dict to another is a bit repetitive and a little vulnerable to mis-pasting a key name. I'd suggest putting the mapping between the keys in a dict so that there's a single source of truth for what keys you're copying from the input metrics dicts and what keys that data will live under in the output:
fields = { # Map input metrics dicts to per-channel metrics dicts. 'impressions': 'impressions', # same 'clicks': 'clicks', # same 'cost_micros': 'cost', # different }
Since each dict in your output is going to contain the keys from fields.values(), you have the option of creating these as plain dicts with their values initialized to zero rather than as defaultdicts (this doesn't have any major benefits over defaultdict(int), but it does make pretty-printing a bit easier):
# Create defaultdict of per-channel metrics dicts. ad_data = defaultdict(lambda: dict.fromkeys(fields.values(), 0))
and then you can do a simple nested iteration to populate ad_data:
# Aggregate input metrics into per-channel metrics. for ad in example: channel = ad['ad_group']['type_'] for k, v in ad['metrics'].items(): ad_data[channel][fields[k]] += int(v)
which for your example input produces:
{'DISPLAY_STANDARD': {'impressions': 14, 'clicks': 4, 'cost': 9}, 'SEARCH_STANDARD': {'impressions': 6, 'clicks': 2, 'cost': 4}}

Would it be considered "pythonic" to use a nested defaultdict where bottom level is defaulted to 0 for counting?

2 comments.

Add a new comment.

Category