0
Follow
2
View

# Sum values of dictionary if any character within key strings match

Try:

``````d = {"255": 8, "323": 1, "438": 1, "938": 2, "166": 1, "117": 10, "777": 2}

keys, groups = list(d), {}
while keys:
current = keys.pop()
s_current = set(current)
for k in groups:
if s_current.intersection(k):
groups[k].append(d[current])
groups[current] = groups[k]
break
else:
groups[current] = [d[current]]

out = {}
for k, v in groups.items():
if id(v) not in out:
out[id(v)] = sum(v)

print(out)
``````

Prints:

``````{140318922050368: 13, 140318911221184: 12}
``````

A little explanation:

We create a temporary dict `groups` where the keys are from original dictionary `d` and values are lists of values from the original dictionary which share a character with the key. Note: the lists are shared between the keys - total number of these lists are equal to total number of distinct groups.

This is similar to @BrokenBenchmark's answer, but it is self-contained, and it also groups things by digit first, so there's no O(n^2) doubly-nested loop over all the strings.

``````from collections import defaultdict

def get_edges(strings):
# group together common digits.
by_digit = [[] for _ in range(10)]
for i, s in enumerate(strings):
for digit in map(int, s):
by_digit[digit].append(i)

# attach the first of each group to the rest.
for digit_locations in by_digit:
if digit_locations:
x = digit_locations
for y in digit_locations[1:]:
yield (x, y)

def union_find_groups(edges, n):
parent = list(range(n))
size =  * n

def find(x):
# find (path-splitting)
while parent[x] != x:
x, parent[x] = parent[x], parent[parent[x]]
return x

def union(x, y):
x = find(x)
y = find(y)
if x == y:
return
# Union (by size)
if size[x] < size[y]:
x, y = y, x
parent[y] = x
size[x] += size[y]

for x, y in edges:
union(x, y)

result = defaultdict(list)
for x in range(n):
result[find(x)].append(x)

return result.values()

def main(input_data):
strings = list(input_data.keys())
index_groups = union_find_groups(get_edges(strings), len(strings))
string_groups = [[strings[i] for i in group] for group in index_groups]
print(string_groups)
results = [sum(input_data[s] for s in group) for group in string_groups]
return {f"group{i}": total for i, total in enumerate(results)}

if __name__ == "__main__":
result = main({
'255': 8,
'323': 1,
'438': 1,
'938': 2,
'166': 1,
'117': 10,
'777': 2
})
print(result)

``````

Result:

``````[['255', '323', '438', '938'], ['166', '117', '777']]
{'group0': 12, 'group1': 13}
``````

You can use a union-find data structure to solve this problem. The `networkx` package provides an implementation, but there's nothing stopping you from writing your own.

In essence, we maintain a collection of disjoint sets. Initially, every string belongs to its own disjoint set. For each pair of strings, if they have letters in common, we union the disjoint sets they belong to. This eventually gives us the groups that we're looking for.

From here, we use the `.to_sets()` method to get the groupings, and compute the desired sum:

``````from networkx.utils.union_find import UnionFind

data = # dictionary in question, omitted for brevity
keys = list(data.keys())

uf = UnionFind(data.keys())

for outer_idx in range(len(keys)):
for inner_idx in range(outer_idx + 1, len(keys)):
if set(keys[outer_idx]) & set(keys[inner_idx]):
uf.union(keys[outer_idx], keys[inner_idx])

result = {}
for idx, group in enumerate(uf.to_sets()):
result[idx] = sum(data[key] for key in group)

print(result)
``````

This outputs:

``````{0: 12, 1: 13}
``````

Publish Time
6 days ago
Update Time
6 days ago