# Developers Planet

rontho1992 February 2016

### Measuring the distance between points and groups

I am trying to measure the distance between points inside a pandas dataframe. I first and looking to measure the distance between points that are in a sub region and get the average distance for that group. Then I want to measure the distance between the subregions (measuring the distance between those two vectors). I understand how to do the measuring part (using `scipy.spatial.distance.euclidean` for the former and `scipy.spatial.distance.cdist` for the latter). The issue I am running across is figuring out how to apply the functions to the dataset. I think I should use groupby.apply() and feed in my function, but I'm having trouble conceptualizing that. The dataframe looks like this:

``````id, latitude, longitude, subregion, region
``````

Currently I have:

``````import pandas as pd
import numpy as np
from scipy.spatial.distance import euclidean

...
def calculate_distance(x,y):
return x._get_numeric_data().apply(axis=0, func=euclidean[x,y]).mean()

df.groupby('subregion').apply(calculate_distance)
``````

I know this is incorrect as I want to apply to multiple columns for all the rows. My other thought is that I am using the wrong data structure for this.

rontho1992 February 2016

I ended up using a different data structure and in the end looks like this:

``````contacts = {}

for i, row in sc_walkbook.iterrows():
if contacts.get(row['region'],0) == 0:
contacts[row['region']] = {}
contacts[row['region']][row['subregion']] = {}
contacts[row['region']][row['subregion']]['coords'] = []
contacts[row['region']][row['subregion']]['distances'] = []
elif contacts[row['region']].get(row['subregion'],0) == 0:
contacts[row['region']][row['subregion']] = {}
contacts[row['region']][row['subregion']]['coords'] = []
contacts[row['region']][row['subregion']]['distances'] = []
else:
pass
contacts[row['region']][row['subregion']]['coords'].append([row['T_Latitude'],row['T_Longitude']])

for region in contacts.itervalues():
for subregion in region.itervalues():
for a, b in itertools.combinations(subregion['coords'], 2):
subregion['distances'].append(euclidean(a, b))
``````