Home Ask Login Register

Developers Planet

Your answer is one click away!

svgajendra February 2016

Empty the textcontent between the tags in xml

I tried to parse below xml using the dom parser. Error is thrown while parsing the xml. That is because some of the special characters are present inside cdata tag of 'b' element. We need only text content of c and e elements. So I am trying to do empty of 'b' element and to use indexof function to get text content of both c and e elements.

   <a><b><![CDATA[userinput]]></b><c>text of c</c><d></d><e>text of e</e></a>

Below is the code using for pattern matching

Pattern pattern = Pattern.compile("<b>.*?</b>", Pattern.DOTALL |  Pattern.UNICODE_CASE | Pattern.MULTILINE);
Matcher matcher = pattern.matcher(input);
StringBuilder builder = new StringBuilder();
int lastIndex = 0;
while (matcher.find()) {
    builder.append(input.substring(lastIndex, matcher.start()));
    lastIndex = matcher.end();

Below scenerio is failing with above pattern

 <a><b><![CDATA[test 123 </b><c>inside</c>]]></b><c>outside</c><d></d><e>out side</e></a>


 <a><c>inside</c>]></b><c>outside</c><d></d><e>out side</e></a>

Expected :-

<a><c>outside</c><d></d><e>out side</e></a>

Could you please let me know the best way to resolve this issue. User input might be any choice of text from the user.

Thanks in advance Gajendra


hlastras February 2016

Try it:

String output = input.replaceAll("<b>.*</b>", "");

Post Status

Asked in February 2016
Viewed 3,146 times
Voted 5
Answered 1 times


Leave an answer

Quote of the day: live life