user1362215 February 2016

How do I modify strings generated by an element.strings method in BeautifulSoup?

When there are tags within other tags (e.g., the <b> within <p>), the string element of the parent element is empty, and the strings element is a generator which produces all the strings.

<html>
<body>
<p> First p <b> First b </b>second part first p</p>
<p> Second p <a> first link</a> second part second p <a> second link</a> third part second p</p> 
</body>
</html>

In my code,

soup = BeautifulSoup(html)#text above
ps = soup.find_all('p')
p0 = ps[0]
for s in p0.strings:
    #makes sure that child elements inside <p> tag are skipped
    if s.findParent() == p0:
        s.replace_with('new text')

However, when I run this, I get

Traceback (most recent call last):
      File "<pyshell#243>", line 1, in <module>
        s.replace_with('new_text')
      File "/usr/lib/python2.7/dist-packages/bs4/element.py", line 211, in replace_with
        my_index = self.parent.index(self)
    AttributeError: 'NoneType' object has no attribute 'index'

The first string of p0 had its text changed, but the last element did not, since an error was thrown. The same thing happens with the second element of p1 = ps[1]. How do I go about modifying each of the string elements separately? I want to preserve all of the existing tags.

Answers


John La Rooy February 2016

This loop is not safe because you are modifying p0 as you iterate over it

for s in p0.strings:

One safe way is to make a list to snapshot p0 before you iterate over it.

for s in list(p0.strings):

Post Status

Asked in February 2016
Viewed 3,807 times
Voted 5
Answered 1 times

Search




Leave an answer