Developers Planet

Dict Deconstruction and Reconstruction in Python

I have multiple dictionaries. There is a great deal of overlap between the dictionaries, but they are not identical.

``````a = {'a':1,'b':2,'c':3}
b = {'a':1,'c':3, 'd':4}
c = {'a':1,'c':3}
``````

I'm trying to figure out how to break these down into the most primitive pieces and then reconstruct the dictionaries in the most efficient manner. In other words, how can I deconstruct and rebuild the dictionaries by typing each key/value pair the minimum number of times (ideally once). It also means creating the minimum number of sets that can be combined to create all possible sets that exist.

In the above example. It could be broken down into:

``````c = {'a':1,'c':3}
a = dict(c.items() + {'b':2})
b = dict(c.items() + {'d':4})
``````

I'm looking for suggestions on how to approach this in Python.

In reality, I have roughly 60 dictionaries and many of them have overlapping values. I'm trying to minimize the number of times I have to type each k/v pair to minimize potential typo errors and make it easier to cascade update different values for specific keys.

An ideal output would be the most basic dictionaries needed to construct all dictionaries as well as the formula for reconstruction.

en_Knight February 2016

Here is a solution. It isn't the most efficient in any way, but it might give you an idea of how to proceed.

``````a = {'a':1,'b':2,'c':3}
b = {'a':1,'c':3, 'd':4}
c = {'a':1,'c':3}

class Cover:
def __init__(self,*dicts):
# Our internal representation is a link to any complete subsets, and then a dictionary of remaining elements
mtx = [[-1,{}] for d in dicts]
for i,dct in enumerate(dicts):
for j,odct in enumerate(dicts):
if i == j: continue # we're always a subset of ourself
# if everybody in A is in B, create the reference
if all( k in dct for k in odct.keys() ):
mtx[i][0] = j
dif = {key:value for key,value in dct.items() if key not in odct}
mtx[i][1].update(dif)
break

for i,m in enumerate(mtx):
if m[1] == {}: m[1] = dict(dicts[i].items())

self.mtx = mtx

def get(self, i):
r = { key:val for key, val in self.mtx[i][1].items()}
if (self.mtx[i][0] > 0): # if we had found a subset, add that
r.update(self.mtx[self.mtx[i][0]][1])
return r

cover = Cover(a,b,c)

print(a,b,c)
print('representation',cover.mtx)
# prints [[2, {'b': 2}], [2, {'d': 4}], [-1, {'a': 1, 'c': 3}]]
# The "-1" In the third element indicates this is a building block that cannot be reduced; the "2"s indicate that these should build from the 2th element
print('a',cover.get(0))
print('b',cover.get(1))
print('c',cover.get(2))
``````

The idea is very simple: if any of the maps are complete subsets, substitute the duplication for a reference. The compression could certainly backfire for certain matrix combinations, and can be easily improved upon

Simple improvements

Change the first line of get, or using your more concise dictionary addition code hinted at in the question, might immediately improve readabil

B. M. February 2016

Below a script to generate a script that reconstruct dictionaries.

For example consider this dictionary of dictionaries:

``````>>>dicts
{'d2': {'k4': 'k4', 'k1': 'k1'},
'd0': {'k2': 'k2', 'k4': 'k4', 'k1': 'k1', 'k3': 'k3'},
'd4': {'k4': 'k4', 'k0': 'k0', 'k1': 'k1'},
'd3': {'k0': 'k0', 'k1': 'k1'},
'd1': {'k2': 'k2', 'k4': 'k4'}}
``````

For clarity, we continue with sets because the association key value can be done elsewhere.

``````sets= {k:set(v.keys()) for k,v in dicts.items()}

>>>sets
{'d2': {'k1', 'k4'},
'd0': {'k1', 'k2', 'k3', 'k4'},
'd4': {'k0', 'k1', 'k4'},
'd3': {'k0', 'k1'},
'd1': {'k2', 'k4'}}
``````

Now compute the distances (number of keys to add or/and remove to go from one dict to another):

``````df=pd.DataFrame(dicts)
charfunc=df.notnull()
distances=pd.DataFrame((charfunc.values.T[...,None] != charfunc.values).sum(1),
df.columns,df.columns)

>>>>distances
d0  d1  d2  d3  d4
d0   0   2   2   4   3
d1   2   0   2   4   3
d2   2   2   0   2   1
d3   4   4   2   0   1
d4   3   3   1   1   0
``````

Then the script that write the script. The idea is to begin with the shortest set, and then at each step to construct the nearest set from those already built:

``````script=open('script.py','w')
dicoto=df.count().argmin() # the shortest set
script.write('res={}\nres['+repr(dicoto)+']='+str(sets[dicoto])+'\ns=[\n')
done=[]
todo=df.columns.tolist()
while True :
done.append(dicoto)
todo.remove(dicoto)
if not todo : break
table=distances.loc[todo,done]
ito,ifrom=np.unravel_index(table.values.argmin(),table.shape)
dicofrom=table.columns[ifrom]
setfrom=sets[dicofrom]
dicoto=table.index[ito]
setto=sets[dicoto]
toremove=setfrom-setto
``` ```
``` ```
``` ```
``` ```
``` ```
``` Post Status Asked in February 2016Viewed 1,278 timesVoted 13Answered 2 times Search Leave an answer ```
``` ```
``` ```
``` Quote of the day: live life .btn-primary{ background-color: #f44336 !important; border-color: #f44336 !important; } Devs Planet ® 2014-2016 www.devsplanet.com Devs Planet © all rights reserved Quick Actions Search // Used to toggle the menu on small screens when clicking on the menu button function myFunction() { var x = document.getElementById("navDemo"); if (x.className.indexOf("w3-show") == -1) { x.className += " w3-show"; } else { x.className = x.className.replace(" w3-show", ""); } } ```