- AuthorPosts
- August 25, 2011 at 9:27 pm #9606DeipotentParticipant
Related to my original post about a regex for finding duplicate lines:
http://www.emeditor.com/modules/newbb/viewtopic.php?topic_id=3&forum=3&post_id=5812#forumpost5812
and my subsequent enhancement request about ignoring r in a regex:
http://www.emeditor.com/modules/newbb/viewtopic.php?topic_id=1811&post_id=5888&forum=4#forumpost5888
it would appear there is a bug when “r” appears in a regex, and is followed by a regex operator like ? or * (and probably others).
From playing around with it, it looks like EmEditor just removes the “r”, so when there is a regex like
^(.*)(r?n1)+$
EmEditor silently strips out “r”, leaving
^(.*)(?n1)+$
which is incorrect.
A few possible solutions involve:
1) Checking if the “r” is followed by an operator which applies to the previous character (eg. ? or * etc.), and stripping that operator as well.
2) Replacing the “r” with “(r)” seems to solve the problem, but that would then affect back references.
3) If possible, tell the regex engine to ignore “r”.
Option (3) may be the best option if available, since it will probably handle all cases properly.
Either way, it’s not as easy at it would first seem.
September 16, 2011 at 4:37 pm #9647Yutaka EmuraKeymasterHi Deipotent,
As you wrote, EmEditor strips out ‘r’ from regular expressions when you use Find. This is because a new line is represented as a ‘n’ and not ‘rn’ no matter whether ‘r’ or ‘n’ or ‘rn’ is used for a new line. Currently, this is the specification because many users were confused whey they needed to specify ‘r’ or ‘rn’ for a new line in earlier versions.
When you specify a new line, please use ‘n’, and not ‘r’.
Thank you!
- AuthorPosts
- You must be logged in to reply to this topic.