Your answer is one click away!

ojdo February 2016
### Aggregate/Remove duplicate rows in DataFrame based on swapped index levels

## Sample input

## Goal

## Sample result

```
import pandas as pd
df = pd.DataFrame([
['A', 'B', 1, 5],
['B', 'C', 2, 2],
['B', 'A', 1, 1],
['C', 'B', 1, 3]],
columns=['from', 'to', 'type', 'value'])
df = df.set_index(['from', 'to', 'type'])
```

Which looks like this:

```
value
from to type
A B 1 5
B C 2 2
A 1 1
C B 1 3
```

I now want to remove "duplicate" rows from this in the following sense: for each row with an arbitrary index `(A, B, type)`

, if there exists a row `(B, A, type)`

, the value of the *second* row should be **added** to the *first* row and the second row discarded. In the example above, the row `(B, A, 1)`

with value `1`

should be added to the first row and discarded, leading to the following desired result.

```
value
from to type
A B 1 6
B C 2 2
C B 1 3
```

This is my best try so far. It feels unnecessarily verbose and clunky:

```
# aggregate val of rows with (a,b,t) == (b,a,t)
df2 = df.reset_index()
df3 = df2.rename(columns={'from':'to', 'to':'from'})
df_both = df.join(df3.set_index(
['from', 'to', 'type']),
rsuffix='_b').sum(axis=1)
# then remove the second, i.e. (b,a,t) row
rows_to_keep = []
rows_to_remove = []
for a,b,t in df_both.index:
if (b,a,t) in df_both.index and not (b,a,t) in rows_to_keep:
rows_to_keep.append((a,b,t))
rows_to_remove.append((b,a,t))
df_final = df_both.drop(rows_to_remove)
df_final
```

Especially the second "de-duplication" step feels very unpythonic. (How) can I improve these steps?

Not sure how much better this is, but it's certainly different

```
import pandas as pd
from collections import Counter
df = pd.DataFrame([
['A', 'B', 1, 5],
['B', 'C', 2, 2],
['B', 'A', 1, 1],
['C', 'B', 1, 3]],
columns=['from', 'to', 'type', 'value'])
df = df.set_index(['from', 'to', 'type'])
ls = df.to_records()
ls = list(ls)
ls2=[]
for l in ls:
i=0
while i <= l[3]:
ls2.append(list(l)[:3])
i+=1
counted = Counter(tuple(sorted(entry)) for entry in ls2)
```

Asked in February 2016

Viewed 1,029 times

Voted 6

Answered 1 times

Viewed 1,029 times

Voted 6

Answered 1 times