Neuril February 2016

R apply function on data frame column

I need to , efficiently, parse one of my dataframe column (a url string) and call a function (strsplit) to parse it, e.g.:

url <- c("")


My data frame : looks like this:

                    classes              url

This df has 100k rows and I don't want to loop/iterate over it, parse each url separately and write the results to a new data frame. What I DO need/want is to create a new 5 column data frame:

df.result <- data.frame(fullurl = as.character(),baseurl=as.character(), firstlevel = as.character(), secondlevel=as.character(),thirdlevel=as.character(),classificaiton=as.character())

call one of the "apply" family function over$url and to write the results to the new data frame df.result such that the first column (fullurl) will be populated with the relevant$url, the 2nd to 5th columns will be populated with the relevant results from applying


- taking the only the first, 2nd, 3rd and 4th elements from the resulted vector and putting it in the first,2nd, 3rd and 4th columns in df.result and finally putting the$classes in the new data frame columns df.result$classificaiton

Sorry for the complication and let me know if anything need to be further cleared out.


doker February 2016

The simple solution is to use:

apply(row, 2, function(col) {})

Heroka February 2016

You could consider using the package splitstackshape to do this; we can use its cSplit-function. Setting drop to F ensures that the original column is preserved. Not that it returns a data.table, not a data.frame.

output <- cSplit(dat,2,sep="/", drop=F)

data used:

dat <- data.frame(classes="[107,662,685,508,111,654,509]",

K. Rohde February 2016

There is no need for apply, as far as I see it.

Try this: <- data.frame(classes = c(107,662,685,508,111,654,509), 
  url = c("", "", 
          "", "", 
          "", "", 
          ""), stringsAsFactors = FALSE)

df.result <-

names(df.result) <- c("classification", "fullurl")

df.result[c("baseurl", "firstlevel", "secondlevel", "thirdlevel")] <-, strsplit(df.result$fullurl, "/"))

docendo discimus February 2016

Here's an option with data.table which should be pretty fast. If your data looks like this:

> df
#                        classes                                   url
#1 [107,662,685,508,111,654,509]

You can do the following:

setDT(df)  # convert to data.table 
cols <- c("baseurl", "firstlevel", "secondlevel", "thirdlevel") # define new column names
df[, (cols) := tstrsplit(url, "/", fixed = TRUE)[1:4]]  # assign new columns

Now, the data looks like this:

> df
#                         classes                                   url          baseurl firstlevel secondlevel thirdlevel
#1: [107,662,685,508,111,654,509]     level1      level2     level3

Post Status

Asked in February 2016
Viewed 3,291 times
Voted 8
Answered 4 times


Leave an answer