Developers Planet

Bitanshu Das February 2016

URL semantic analysis using R

I have a dataset like

URL                                                       Keywords           Impressions     Clicks

http://www.thetelegraphandargus.co.uk/sport/sportbcfc     sports|football      5500           456

I wanted to explode the data set in the following format:

URL                                       URL Keyword         Keyword         Impressions        Clicks
 http://www.thetelegraphandargus.co.uk     sport               sports           5500                456
 http://www.thetelegraphandargus.co.uk     sportbcfc           football         5500                456

I have tried splitting them using stringr and urltools libraries.

ee <- as.character(data$URL)

eee <- strsplit(ee, "/")

maxLen <- max(sapply(eee, length))

L <-   t(sapply(eee, function(x)
c(x, rep(NA, maxLen - length(x)))




I am able to split the URLs but I am not getting it in the desired format. They are being split in the same row.

PS: I used delimiter function in excel for Keywords column with delimiter as "|"


Sotos February 2016

Here is how cSplit from splitstackshape package does it

cSplit(dta1, "keywords", direction = "wide", sep = "|")
                                                      a1 keywords   a3  a4
1: http://www.thetelegraphandargus.co.uk/sport/sportbcfc   sports 5500 456
2: http://www.thetelegraphandargus.co.uk/sport/sportbcfc football 5500 456

