Jane February 2016

R: assign values to set of vectors based on conditions in another data frame of different length

I would like to assign a segment-ID to longitude and latitude values that fall within a certain range of minimum/maximum long and lat values that are stored in a different data frame of a different length. My data looks like this: Data frame with observations:

head(obs)
longitude latitude
52.06264 6.412816 
52.06097 6.413106 
51.06097 6.413346 
54.06097 6.413276
51.06089 6.413114
52.05444 6.413094

Data frame with ranges and segment-ID:

head(seg)
segment   lon_max  lon_min  lat_max  lat_min
01a       6.857822 6.857476 51.05837 51.03489
01b       6.858979 6.857834 51.03433 50.99901
01c       6.860019 6.858982 51.99836 51.96330
01d       6.860960 6.860050 51.96277 51.92718
01e       6.862294 6.860979 51.92657 51.89125
01f       6.863179 6.862301 51.89059 51.85562

For each point of observation I would like to know within which 'segment' it falls, so I would ideally end up with something like this:

longitude latitude segment
52.03464 6.458816  1a
52.05667 6.416606  1a
51.06097 6.446346  1b
54.03757 6.413276  1c
51.06089 6.422114  1b
52.34243 6.413094  1a

I have tried to do this using just latitude, but I get an error message, due to the different length of the vectors.

obs[['segment']] <- for (i in obs$latitude) {
   if (i>=seg$lat_min & i<=seg$lat_max) {
     obs$segment=seg$segment
   } else {
     obs$segment='NA'}
}
  Error in `$<-.data.frame`(`*tmp*`, "segment", value = 1:118) : 
  replacement has 118 rows, data has 10284  

I realise why this can't work because it doesn't match row by row, but I don't know how to do this. How can I match each pair of latitude and longitude row by row with the min/max values until I find the range into which it fits and assign the correct segment_ID?

Thanks in advance!

Answers


mra68 February 2016

Given a vector x with components "longitude" and "latitude", the function f uses which.max to find the appropriate row in seg. Then apply(obs,1,f) is the vector of length nrow(obs) which contains the row numbers of the appropriate segments in seg:

obs <- read.table( header = TRUE,
                 text =
"latitude longitude
52.06264 6.412816 
51.90089 6.861084
52.06097 6.413106 
51.06097 6.413346 
54.06097 6.413276
51.04097 6.857576
51.06089 6.413114
51.95089 6.860084
52.05444 6.413094" )

seg <- read.table( header = TRUE,
                   stringsAsFactors = FALSE,
                   text = 
"segment   lon_max  lon_min  lat_max  lat_min
01a       6.857822 6.857476 51.05837 51.03489
01b       6.858979 6.857834 51.03433 50.99901
01c       6.860019 6.858982 51.99836 51.96330
01d       6.860960 6.860050 51.96277 51.92718
01e       6.862294 6.860979 51.92657 51.89125
01f       6.863179 6.862301 51.89059 51.85562")


f <- function(x)
{
  which.max( c( ( seg["lon_min"] <= c(x["longitude"]) ) &
                ( seg["lon_max"] >  c(x["longitude"]) ) &
                ( seg["lat_min"] <= c(x["latitude"])  ) &
                ( seg["lat_max"] >  c(x["latitude"])  ),
                TRUE                                      ) )
}

X <- cbind( obs, segment = seg$segment[apply(obs,1,f)] )

Result:

> X
  latitude longitude segment
1 52.06264  6.412816    <NA>
2 51.90089  6.861084     01e
3 52.06097  6.413106    <NA>
4 51.06097  6.413346    <NA>
5 54.06097  6.413276    <NA>
6 51.04097  6.857576     01a
7 51.06089  6.413114    <NA>
8 51.95089  6.860084     01d
9 52.05444  6.413094    <NA>


Scott February 2016

Your longitude and latitude were mixed up across both data frames.

Also, your example data makes it impossible to match your obs and seg since all of your obs of latitude for instance, are less than the look up vals for lat_min.

Despite all that, this should work. You are trying to do a look up table.

#create this so we have validation data
newline <- c( 51.05837, 6.857822)       
newobs <- rbind(obs, newline)

library(sqldf)
looked_up<-function(data, lookup){
data<-sqldf("select A.*,B.segment from
          data A left join lookup B 
          ON (A.longitude >= B.lon_min and A.longitude 
          <= B.lon_max and A.latitude >= B.lat_min and A.latitude 
          <= B.lat_max) ")
data
}

looked_up(newobs, seg)

#RESULTS
longitude latitude segment
1  52.06264 6.412816    <NA>
2  52.06097 6.413106    <NA>
3  51.06097 6.413346    <NA>
4  54.06097 6.413276    <NA>
5  51.06089 6.413114    <NA>
6  52.05444 6.413094    <NA>
7  51.05837 6.857822     01a

I found this helpful when answering your question.

http://shashiasrblog.blogspot.com/2014/01/excel-style-vlookup-and-rangelookup-in-r.html

Post Status

Asked in February 2016
Viewed 3,145 times
Voted 14
Answered 2 times

Search




Leave an answer