Bitanshu Das February 2016

URL semantic analysis using R

I have a dataset like

URL                                                       Keywords           Impressions     Clicks

http://www.thetelegraphandargus.co.uk/sport/sportbcfc     sports|football      5500           456

I wanted to explode the data set in the following format:

URL                                       URL Keyword         Keyword         Impressions        Clicks
 http://www.thetelegraphandargus.co.uk     sport               sports           5500                456
 http://www.thetelegraphandargus.co.uk     sportbcfc           football         5500                456

I have tried splitting them using stringr and urltools libraries.

ee <- as.character(data$URL)

eee <- strsplit(ee, "/")

maxLen <- max(sapply(eee, length))

L <-   t(sapply(eee, function(x)
c(x, rep(NA, maxLen - length(x)))
  ))         

 F=data.frame(L)

and

 d<-url_parse(as.character(data$url))

I am able to split the URLs but I am not getting it in the desired format. They are being split in the same row.

PS: I used delimiter function in excel for Keywords column with delimiter as "|"

Answers


Sotos February 2016

Here is how cSplit from splitstackshape package does it

cSplit(dta1, "keywords", direction = "wide", sep = "|")
                                                      a1 keywords   a3  a4
1: http://www.thetelegraphandargus.co.uk/sport/sportbcfc   sports 5500 456
2: http://www.thetelegraphandargus.co.uk/sport/sportbcfc football 5500 456

Post Status

Asked in February 2016
Viewed 3,170 times
Voted 7
Answered 1 times

Search




Leave an answer