svgajendra February 2016

Empty the textcontent between the tags in xml

I tried to parse below xml using the dom parser. Error is thrown while parsing the xml. That is because some of the special characters are present inside cdata tag of 'b' element. We need only text content of c and e elements. So I am trying to do empty of 'b' element and to use indexof function to get text content of both c and e elements.

   <a><b><![CDATA[userinput]]></b><c>text of c</c><d></d><e>text of e</e></a>

Below is the code using for pattern matching

Pattern pattern = Pattern.compile("<b>.*?</b>", Pattern.DOTALL |  Pattern.UNICODE_CASE | Pattern.MULTILINE);
Matcher matcher = pattern.matcher(input);
StringBuilder builder = new StringBuilder();
int lastIndex = 0;
while (matcher.find()) {
    builder.append(input.substring(lastIndex, matcher.start()));
    lastIndex = matcher.end();
}
builder.append(input.substring(lastIndex));

Below scenerio is failing with above pattern

 <a><b><![CDATA[test 123 </b><c>inside</c>]]></b><c>outside</c><d></d><e>out side</e></a>

O/P:-

 <a><c>inside</c>]></b><c>outside</c><d></d><e>out side</e></a>

Expected :-

<a><c>outside</c><d></d><e>out side</e></a>

Could you please let me know the best way to resolve this issue. User input might be any choice of text from the user.

Thanks in advance Gajendra

Answers


hlastras February 2016

Try it:

String output = input.replaceAll("<b>.*</b>", "");

Post Status

Asked in February 2016
Viewed 3,146 times
Voted 5
Answered 1 times

Search




Leave an answer