- AuthorPosts
- November 29, 2007 at 10:36 am #5068jugaorParticipant
Hi, I tried several versions (5 up 7beta) and I found the next ‘bugs’, both in manual / script searches (Spanish texts):
a por (eeFindReplaceOnlyWord)
matches “creería por”, “CAMPAÑA POR”, etc. (i.e., it breaks the words at the accented vowels or “Ñ”/”ñ”)any accented vowel (eeFindReplaceOnlyWord)
matches “diseñé”, “ENSEÑÓ”, etc. (i.e., it breaks at the “Ñ”/”ñ” the words with final accented vowels)In manual searches (with an open document), it matches all the accented vowels inside words despite “Search Only Word” (i.e. it matches “cómprale”, “mamá”, “después”, etc.)
(?!es |son )esta(s?)(!|?)
discards the first negative subexpression (i.e., it matches “esta!” / “esta?” / “estas!” / “estas?”), despite the fact I use ‘eeFindReplaceRegExp Or eeFindReplaceOnlyWord’ optionsIf I simplify the expression
(?!es) esta(!|?)
(?!es )esta(!|?)
or
(?!son) estas(!|?)
(?!son )estas(!|?)it has the same behavior. However,
(¡|¿)esta(s?)(?! es| son)
excepts the correct ones.If you need more information, please email-me.
TIA.
jugaorNovember 29, 2007 at 7:15 pm #5071Yutaka EmuraKeymasterjugaor wrote:
Hi, I tried several versions (5 up 7beta) and I found the next ‘bugs’, both in manual / script searches (Spanish texts):a por (eeFindReplaceOnlyWord)
matches “creería por”, “CAMPAÑA POR”, etc. (i.e., it breaks the words at the accented vowels or “Ñ”/”ñ”)any accented vowel (eeFindReplaceOnlyWord)
matches “diseñé”, “ENSEÑÓ”, etc. (i.e., it breaks at the “Ñ”/”ñ” the words with final accented vowels)In manual searches (with an open document), it matches all the accented vowels inside words despite “Search Only Word” (i.e. it matches “cómprale”, “mamá”, “después”, etc.)
(?!es |son )esta(s?)(!|?)
discards the first negative subexpression (i.e., it matches “esta!” / “esta?” / “estas!” / “estas?”), despite the fact I use ‘eeFindReplaceRegExp Or eeFindReplaceOnlyWord’ optionsIf I simplify the expression
(?!es) esta(!|?)
(?!es )esta(!|?)
or
(?!son) estas(!|?)
(?!son )estas(!|?)it has the same behavior. However,
(¡|¿)esta(s?)(?! es| son)
excepts the correct ones.If you need more information, please email-me.
TIA.
jugaorAs far as your first question is concerned, EmEditor did not try to check unicode characters (character code > U+0080) in previous versions for the speed. However, I will add a routine to check some Latin character (ch >= 0x00c0 && ch <= 0x02b8) in the next beta version. This addition will not cover all the Unicode characters but still improve "whole word" accuracy in most cases while not sacrificing much speed.
I was not sure about your latter question, but there are two unnecessary spaces in your regular expression: (?!es |son )esta(s?)(!|?)
One between “s” and “|”, and the other between ‘n’ and ‘)’.
Removing these spaces does not solve your issue?
November 30, 2007 at 5:30 am #5074jugaorParticipantHi, thank you very much for your response.
1. In Spanish, the ‘special’ letters are ÁÉÍÓÚÜ, áéíóúü, Ñ, ñ. I presume that these Unicode chars cover them :)
2. The spaces are needed, since they’re two whole words:
“esta” = “this” / “estas” = “these”, both feminine.
“es” = “is” (singular, verb to be)
“son” = “are” (plural, verb to be)
The strange thing is that EmEditor rightly works with the same subexpression after, not before (i.e. “(¡|¿)esta(s?)(?! es| son)” is correct).I have been trying to use EmEditor to automatically correct words with bad orthography in subtitles files (Spanish). I wrote some complex VBEE scripts for that, and I found these issues above.
Thanks for your attention,
jugaorPS: please, write me when the new beta is ready :)
November 30, 2007 at 8:30 pm #5077Yutaka EmuraKeymasterjugaor wrote:
Hi, thank you very much for your response.1. In Spanish, the ‘special’ letters are ÁÉÍÓÚÜ, áéíóúü, Ñ, ñ. I presume that these Unicode chars cover them :)
2. The spaces are needed, since they’re two whole words:
“esta” = “this” / “estas” = “these”, both feminine.
“es” = “is” (singular, verb to be)
“son” = “are” (plural, verb to be)
The strange thing is that EmEditor rightly works with the same subexpression after, not before (i.e. “(¡|¿)esta(s?)(?! es| son)” is correct).I have been trying to use EmEditor to automatically correct words with bad orthography in subtitles files (Spanish). I wrote some complex VBEE scripts for that, and I found these issues above.
Thanks for your attention,
jugaorPS: please, write me when the new beta is ready :)
(?=pattern) (positive lookahead search) and (?!pattern) (negative lookahead search) look ahead from the position where search begins.
For example, expression “(?=x)x” always matches, and expression “(?!x)x” never matches.
So it doesn’t make sense to place (?=pattern) or (?!pattern) at the beginning of a search term.
I will release beta 41 today or tomorrow.
December 1, 2007 at 8:05 am #5080jugaorParticipantTHANK YOU VERY MUCH! I tried the 41 beta and the ‘special chars’ issue is gone! :D
Also, I saw that I misunderstood the “look ahead” expression :-?
I needed to use the “look behind” one (?<!pattern). Excuse me!Congratulations for your excellent job!
jugaor - AuthorPosts
- You must be logged in to reply to this topic.