Strange behavior – EmEditor (Text Editor)

Viewing 5 posts - 1 through 5 (of 5 total)

Author
Posts
November 29, 2007 at 10:36 am #5068
jugaor
Participant
Hi, I tried several versions (5 up 7beta) and I found the next ‘bugs’, both in manual / script searches (Spanish texts):
a por (eeFindReplaceOnlyWord)
matches “creería por”, “CAMPAÑA POR”, etc. (i.e., it breaks the words at the accented vowels or “Ñ”/”ñ”)
any accented vowel (eeFindReplaceOnlyWord)
matches “diseñé”, “ENSEÑÓ”, etc. (i.e., it breaks at the “Ñ”/”ñ” the words with final accented vowels)
In manual searches (with an open document), it matches all the accented vowels inside words despite “Search Only Word” (i.e. it matches “cómprale”, “mamá”, “después”, etc.)
(?!es |son )esta(s?)(!|?)
discards the first negative subexpression (i.e., it matches “esta!” / “esta?” / “estas!” / “estas?”), despite the fact I use ‘eeFindReplaceRegExp Or eeFindReplaceOnlyWord’ options
If I simplify the expression
(?!es) esta(!|?)
(?!es )esta(!|?)
or
(?!son) estas(!|?)
(?!son )estas(!|?)
it has the same behavior. However,
(¡|¿)esta(s?)(?! es| son)
excepts the correct ones.
If you need more information, please email-me.
TIA.
jugaor
November 29, 2007 at 7:15 pm #5071
Yutaka Emura
Keymaster
jugaor wrote:
Hi, I tried several versions (5 up 7beta) and I found the next ‘bugs’, both in manual / script searches (Spanish texts):
a por (eeFindReplaceOnlyWord)
matches “creería por”, “CAMPAÑA POR”, etc. (i.e., it breaks the words at the accented vowels or “Ñ”/”ñ”)
any accented vowel (eeFindReplaceOnlyWord)
matches “diseñé”, “ENSEÑÓ”, etc. (i.e., it breaks at the “Ñ”/”ñ” the words with final accented vowels)
In manual searches (with an open document), it matches all the accented vowels inside words despite “Search Only Word” (i.e. it matches “cómprale”, “mamá”, “después”, etc.)
(?!es |son )esta(s?)(!|?)
discards the first negative subexpression (i.e., it matches “esta!” / “esta?” / “estas!” / “estas?”), despite the fact I use ‘eeFindReplaceRegExp Or eeFindReplaceOnlyWord’ options
If I simplify the expression
(?!es) esta(!|?)
(?!es )esta(!|?)
or
(?!son) estas(!|?)
(?!son )estas(!|?)
it has the same behavior. However,
(¡|¿)esta(s?)(?! es| son)
excepts the correct ones.
If you need more information, please email-me.
TIA.
jugaor
As far as your first question is concerned, EmEditor did not try to check unicode characters (character code > U+0080) in previous versions for the speed. However, I will add a routine to check some Latin character (ch >= 0x00c0 && ch <= 0x02b8) in the next beta version. This addition will not cover all the Unicode characters but still improve "whole word" accuracy in most cases while not sacrificing much speed.
I was not sure about your latter question, but there are two unnecessary spaces in your regular expression: (?!es |son )esta(s?)(!|?)
One between “s” and “|”, and the other between ‘n’ and ‘)’.
Removing these spaces does not solve your issue?
November 30, 2007 at 5:30 am #5074
jugaor
Participant
Hi, thank you very much for your response.
1. In Spanish, the ‘special’ letters are ÁÉÍÓÚÜ, áéíóúü, Ñ, ñ. I presume that these Unicode chars cover them :)
2. The spaces are needed, since they’re two whole words:
“esta” = “this” / “estas” = “these”, both feminine.
“es” = “is” (singular, verb to be)
“son” = “are” (plural, verb to be)
The strange thing is that EmEditor rightly works with the same subexpression after, not before (i.e. “(¡|¿)esta(s?)(?! es| son)” is correct).
I have been trying to use EmEditor to automatically correct words with bad orthography in subtitles files (Spanish). I wrote some complex VBEE scripts for that, and I found these issues above.
Thanks for your attention,
jugaor
PS: please, write me when the new beta is ready :)
November 30, 2007 at 8:30 pm #5077
Yutaka Emura
Keymaster
jugaor wrote:
Hi, thank you very much for your response.
1. In Spanish, the ‘special’ letters are ÁÉÍÓÚÜ, áéíóúü, Ñ, ñ. I presume that these Unicode chars cover them :)
2. The spaces are needed, since they’re two whole words:
“esta” = “this” / “estas” = “these”, both feminine.
“es” = “is” (singular, verb to be)
“son” = “are” (plural, verb to be)
The strange thing is that EmEditor rightly works with the same subexpression after, not before (i.e. “(¡|¿)esta(s?)(?! es| son)” is correct).
I have been trying to use EmEditor to automatically correct words with bad orthography in subtitles files (Spanish). I wrote some complex VBEE scripts for that, and I found these issues above.
Thanks for your attention,
jugaor
PS: please, write me when the new beta is ready :)
(?=pattern) (positive lookahead search) and (?!pattern) (negative lookahead search) look ahead from the position where search begins.
For example, expression “(?=x)x” always matches, and expression “(?!x)x” never matches.
So it doesn’t make sense to place (?=pattern) or (?!pattern) at the beginning of a search term.
I will release beta 41 today or tomorrow.
December 1, 2007 at 8:05 am #5080
jugaor
Participant
THANK YOU VERY MUCH! I tried the 41 beta and the ‘special chars’ issue is gone! :D
Also, I saw that I misunderstood the “look ahead” expression :-?
I needed to use the “look behind” one (?<!pattern). Excuse me!
Congratulations for your excellent job!
jugaor
Author
Posts

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.