Home Ask Login Register

Developers Planet

Your answer is one click away!

Naresh MG February 2016

python CountVectorizer() vocabulary_ get method returns None

I have this piece of code as per documentation at http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

from sklearn.datasets import load_files
from sklearn.feature_extraction.text import CountVectorizer

count_vect = CountVectorizer()

my_bunch = load_files("c:\\temp\\billing_test\\")

my_data = my_bunch['data']
print (my_bunch.keys())
print('length of data' , len(my_bunch['data']))

X_train_counts = count_vect.fit_transform(my_data)

print ( count_vect.vocabulary_.get(u'algorithm'))

the output is as follows

dict_keys(['target', 'filenames', 'target_names', 'data', 'DESCR'])
target_names ['false', 'true']
length of data 920
(920, 8773)

Wonder why the "None" towards the bottom after (920, 8773)

I have around 460 text documents in each of the folder "true" and "false"



Farseer February 2016

Because word 'algoritham' never appeared in your documents.

Perhaps you should try 'algorithm'.

Post Status

Asked in February 2016
Viewed 2,530 times
Voted 8
Answered 1 times


Leave an answer

Quote of the day: live life