Vivek Sable February 2016

Extract block from CSS by Regular expression or any other method?

I have to extract text between /*Custom-D Start*/ and /*Custom-D End*/, And there maybe spaces after /* and maybe spaces before */

I did this in Two Steps:

  1. Regular Expression to extract start and end text.
  2. String find method to get text between start and end text.

Following is my code:

data ="""/* Highlighting edits text on TIB ONLY and NOT ON PDF Output. 
   .main-igp selector makes this style apply only for TIB.  */
.main-igp .edit1 {color: rgb(235, 127, 36)}
.main-igp .edit2 {color: rgb(0, 0, 180);}
/*Custom-D Start     */
.main-igp .edit3 {color: rgb(0, 180, 180);}
.main-igp .edit6 {color: rgb(200, 200, 0);}
/* Custom-D End */
/* Production Note ===== */
p.production-note-rw {
    display: none;}
/* Production Note END ===== */"""


def extractCustomD():
    """ Extract Custom-D block from a CSS data.
        Starting text is /*Custom-D Start*/

        and ending text is /*Custom-D End*/
        There are may be space after /* and also before */  
    """
    import re
    try:
        start_text = re.findall("/\* *Custom-D Start *\*", data)[0]
        end_text = re.findall("/\* *Custom-D End *\*", data)[0]
    except IndexError:
        return ""
    return data[data.find(start_text)+len(start_text):data.find(end_text)]

We can extract target content from regular expression? or there is any other way to do this?

Edit: Following is working for me

>>> re.findall("/\* *Custom-D Start *\*/([\s\S]*)/\* Custom-D End \*/", data)
['\n.main-igp .edit3 {color: rgb(0, 180, 180);}\n.main-igp .edit6 {color: rgb(200, 200, 0);}\n']

Answers


Wiktor Stribi┼╝ew February 2016

Currently, you just extract /*Custom-D Start */ and /* Custom-D End */ substrings. However, you need text between them.

You can just use one regex expression to extract that substring:

/\* *Custom-D Start *\*/\s*(.*?)/\* *Custom-D End *\*/

See regex demo. Use it with the re.S modifier.

See IDEONE demo:

import re
p = re.compile(r'/\* *Custom-D Start *\*/\s*(.*?)/\* *Custom-D End *\*/', re.DOTALL)
test_str = "/* Highlighting edits text on TIB ONLY and NOT ON PDF Output. \n   .main-igp selector makes this style apply only for TIB.  */\n.main-igp .edit1 {color: rgb(235, 127, 36)}\n.main-igp .edit2 {color: rgb(0, 0, 180);}\n/*Custom-D Start     */\n.main-igp .edit3 {color: rgb(0, 180, 180);}\n.main-igp .edit6 {color: rgb(200, 200, 0);}\n/* Custom-D End */\n/* Production Note ===== */\np.production-note-rw {\n    display: none;}\n/* Production Note END ===== */"
m = p.search(test_str)
if m:
    print(m.group(1))

Note that you can unroll the lazy dot matching to

/\* *Custom-D Start *\*/\s*([^/]*(?:/(?!\* *Custom-D End *\*/)[^/]*)*)

This version is faster than the one with lazy dot matching.

Post Status

Asked in February 2016
Viewed 3,791 times
Voted 8
Answered 1 times

Search




Leave an answer