python - Regex catastrophic backtracking -
i using regex remove css comments , content input document, using following code:
text = re.sub('/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/', '', input_text)
however, running catastrophic backtracking, , i'm pretty sure has *+ nested within * near end of expression. however, i'm not sure how rewrite regex still same thing without nested quantifiers. need to remove comments of forms:
/* remove text */ /* * remove text */ /******* * remove text *******/
can offer little in rewriting regex avoid situation? don't understand how can still accomplish same task without nested quantifiers. appreciate lot!
you can use simple regex leveraging single line flag this:
/\*.*?\*/ or /[*].*?[*]/
you can use python code:
import re p = re.compile(ur'/\*.*?\*/', re.dotall) test_str = u"/* remove text */\n\n/*\n * remove text\n */\n\n/*******\n * remove text\n *******/" subst = u"" result = re.sub(p, subst, test_str)
Comments
Post a Comment