I am trying to tokenize a dataframe with one coulmn and using the followng code:
df = pd.DataFrame(pd.read_csv(args), index= None)
doc_set = pd.DataFrame(df.Country)
tokenizer = RegexpTokenizer(r'\w+')
en_stop = get_stop_words('en')
p_stemmer = PorterStemmer()
texts = 
for i in doc_set:
raw = i.lower()
tokens = tokenizer.tokenize(raw)
stopped_tokens = [i for i in tokens if not i in en_stop]
stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]
This code outputs me only the header of the dataframe which i have created from a csv file:
Please help me in finding whats wrong in my approach.
When python starts spitting out things that make no sense to me, I have gotten in the habit of downloading the latest source, compiling to /usr/local and reinstalling everything with pip. Strangely, this usually fixes things.
Asked in February 2016Viewed 1,262 timesVoted 5Answered 1 times