Home Ask Login Register

Developers Planet

Your answer is one click away!

Ravina Singh February 2016

Getting AttributeError on nltk Textual entailment classifier

Im referring to the link in the section http://www.nltk.org/book/ch06.html#recognizing-textual-entailment

def rte_features(rtepair):
    extractor = nltk.RTEFeatureExtractor(rtepair)
    features = {}
    features['word_overlap'] = len(extractor.overlap('word'))
    features['word_hyp_extra'] = len(extractor.hyp_extra('word'))
    features['ne_overlap'] = len(extractor.overlap('ne'))
    features['ne_hyp_extra'] = len(extractor.hyp_extra('ne'))
    return features
rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])

extractor = nltk.RTEFeatureExtractor(rtepair)
AttributeError                            Traceback (most recent call last)
<ipython-input-39-a7f96e33ba9e> in <module>()
----> 1 extractor = nltk.RTEFeatureExtractor(rtepair)

C:\Users\RAVINA\Anaconda2\lib\site-packages\nltk\classify\rte_classify.pyc in __init__(self, rtepair, stop, lemmatize)
     66         #Get the set of word types for text and hypothesis
---> 67         self.text_tokens = tokenizer.tokenize(rtepair.text)
     68         self.hyp_tokens = tokenizer.tokenize(rtepair.hyp)
     69         self.text_words = set(self.text_tokens)

AttributeError: 'list' object has no attribute 'text'

Its the exact code as mentioned in the book, can anyone help me whats going wrong here. Thanks Ravina


helios35 February 2016

Take a look at the type signatures. Type this into the python shell:

import nltk
x = nltk.corpus.rte.pairs(['rte3_dev.xml'])

tells you x is of type list.

Now, type:


which tells you:

:param rtepair: a RTEPair from which features should be extracted

Clearly, x does not have the correct type for calling nltk.RTEFeatureExtractor. Instead:

<class 'nltk.corpus.reader.rte.RTEPair'>

A single item of the list does have the correct type.

Update: As mentioned in the comment section, extractor.text_words shows only empty strings. This seems to be due to changes made in NLTK since the documentation was written. Long story short: You won't be able to fix this without downgrading to an older version of NLTK or fixing the problem in NLTK yourself. Inside the file nltk/classify/rte_classify.py, you will find the following piece of code:

class RTEFeatureExtractor(object):
    import nltk
    from nltk.tokenize import RegexpTokenizer
    tokenizer = RegexpTokenizer('([A-Z]\.)+|\w+|\$[\d\.]+')
    self.text_tokens = tokenizer.tokenize(rtepair.text)
    self.text_words = set(self.text_tokens)

If you run the same RegexpTokenizer with the exact text from the extractor, it will produce only empty strings:

import nltk
rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])[33]
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer('([A-Z]\.)+|\w+|\$[\d\.]+')

Returns ['', '', …, ''] (i.e., a list of empty strings).

Post Status

Asked in February 2016
Viewed 3,014 times
Voted 13
Answered 1 times


Leave an answer

Quote of the day: live life