- AuthorPosts
- August 8, 2021 at 9:35 am #27654spirosParticipant
Would that be possible? It would be very useful in a great number of languages using diacritics and there are free libraries available to do the job like Lucene.
August 10, 2021 at 9:04 am #27661Yutaka EmuraKeymasterCan you write compelling examples?
August 14, 2021 at 3:15 am #27684spirosParticipantThere is a great number of languages using diacritics web.library.yale.edu/cataloging/music/diacrit
For example a simple phrase in French: l’été arrive à la fin. If one does not use the diacritics (or the correct diacritics) nothing will be found.
Or in polytonic Greek: ὃν οἱ θεοὶ φιλοῦσιν ἀποθνῄσκει νέοςSome software already does that, dtSearch (desktop search tool).
It could be added as an extra toggle button like the “W” (for whole word search).
September 19, 2021 at 5:11 pm #27683spirosParticipantThere is a great number of languages using diacritics: https://web.library.yale.edu/cataloging/music/diacrit
For example a simple phrase in French: l’été arrive à la fin. If one does not use the diacritics (or the correct diacritics) nothing will be found.
Or in polytonic Greek: ὃν οἱ θεοὶ φιλοῦσιν ἀποθνῄσκει νέοςSome software already does that, dtSearch (desktop search tool).
August 31, 2022 at 5:45 pm #28347Yutaka EmuraKeymasterHello,
v21.9.903+ includes the Fuzzy Matching features. Please try it and let me know if you have any inputs.
https://www.emeditor.com/forums/topic/emeditor-v22-0-beta-21-9-901/
In the Find dialog box, set the Fuzzy Matching option, click on the … button next to the option to display the Fuzzy Matching Options dialog box, and set the Ignore nonspacing combining characters, such as diacritics, dakuten, and handakuten option.
September 17, 2022 at 1:55 am #28369spirosParticipantI just saw this, I can confirm that it works great for Greek and Ancient Greek. Thank you so much, this is an amazing feature for me.
September 17, 2022 at 2:26 am #28370spirosParticipantThe only issue I can see so far is speed. Not sure if it could be made faster. I guess some sort of complex regex runs in the background.
September 17, 2022 at 9:10 am #28371Yutaka EmuraKeymasterDid you set the Similarity of 100%, and only set the Ignore nonspacing combining characters, such as diacritics, dakuten, and handakuten option?
If so, and if it’s still slow, please write your test condition.
September 17, 2022 at 9:25 am #28372spirosParticipantSimilarity was left at the default, now that I changed to 100% it is much faster. I guess the same settings are applied at the filter level?
September 17, 2022 at 1:57 pm #28373Yutaka EmuraKeymasterSame settings of fuzzy options are applied to the filter and all other features.
September 21, 2022 at 6:33 pm #28375Yutaka EmuraKeymasterI think the Fuzzy matching feature is more stable now. Please try the latest beta version, and let me know if you have any other comments before releasing the official version.
September 22, 2022 at 2:37 am #28376spirosParticipantEmura-san, when I read this reply I immediately thought of an option to add characters which will be considered equal to another character. After install, I see that something like this you have already done. Congratulations! Truly amazing feature and I think it is unique in text editors, you should point it out in a advertizing material.
I am not clear yet if I can do what I started to describe. For example, when I search and the search string contains the character “κ” I also want it to look for matches for character “χ”. I.e. if I search for οικιστας I also want it to find οἰχιστάς (when diacritics-insensitive is on apparently). How can I add this condition? In the Add dialog I see minimum and maximum character and when I add something in the Treat as, it is copied automatically in Ignore.
Another point, it seems that when one selects options at regex level, these do not apply to filter. Not sure what is the best way here, but it would be good if there was an option to link those two (I.e. changing options in regex-level, automatically applies same options in filter and vice versa).
September 22, 2022 at 7:43 am #28377Yutaka EmuraKeymasterIf you want to treat
κ
asχ
, click Add in the Fuzzy Matching Options dialog box, enterκ
to both the Minimum character and Maximum character text boxes, and enterχ
to the Treat as text box.I am not sure what you mean by regex-level, but if you mean the options you can set in the Advanced dialog box (such as Regular Expressions “.” Can Match Newline Characters option), you can’t just copy these options between Find and Filter. However, you can save all these options to macros except for the Additional Lines to Search for Regular Expressions option.
September 22, 2022 at 8:35 am #28378spirosParticipant1. Great, got it. Perhaps it could be made more intuitive or some context-sensitive help added? Also, when adding the character, in order to be entered as a fuzzy match condition, one needs to click the OK button of the Character range dialog. Perhaps, an extra button near the “Treat as” like Add or Save?
2. I mean that once making Fuzzy match settings in Find/Replace dialog, these did not seem to apply in Filter. So one had to click the Fuzzy button in Filter to change them. Now that I check again, it seems that they are actually the same. So problem solved.
September 23, 2022 at 7:42 am #28379Yutaka EmuraKeymaster1. Great, got it. Perhaps it could be made more intuitive or some context-sensitive help added? Also, when adding the character, in order to be entered as a fuzzy match condition, one needs to click the OK button of the Character range dialog. Perhaps, an extra button near the “Treat as” like Add or Save?
I am not sure if I understand correctly, but I don’t want to put an extra OK button to the Character range dialog box. There is OK button already on the right top corner, and its design is similar to many other dialog boxes.
September 27, 2022 at 3:26 am #28383tuskaParticipantHi,
Fuzzy Matching featureText: _Similitary Search_Ähnlichkeitssuche Filter: Ahn Result: Text will be hidden
Click on button “Fuzzy Matching” in
– emed64_21.9.901: Text is displayed and “Ähn” (without quotes) is marked
– emed64_21.9.912: Text remains hiddenGerman umlauts and special characters (ä,ö,ü,Ä,Ö,Ü,ß)
http://www.cp1252.com/
Ä Unicode 00C4 LATIN CAPITAL LETTER A WITH DIAERESISPlease check.
Thanks!September 27, 2022 at 11:44 pm #28384spirosParticipantI think we should make a note that ß should be considered equivalent to ss when diacritics-insensitive option is checked.
September 28, 2022 at 3:59 am #28385tuskaParticipant2Spiros
Thank you for this addition!I can’t find this option/setting: “diacritics-insensitive”.
Can you please tell me where I can find it?September 28, 2022 at 4:21 am #28386spirosParticipantIgnore nonspacing combining characters, such as diacritics, dakuten, and handakuten
September 28, 2022 at 5:06 am #28387tuskaParticipant2spiros
Thank you!
I had set my window from the “Fuzzy Matching” button with the “Fuzzy Matching Options” too small. :(
September 28, 2022 at 5:36 am #28388tuskaParticipantFuzzy Matching Options:
✅ Ignore nonspacing combining characters, such as diacritics, dakuten, and handakutenText a o u A O U ä ö ü Ä Ö Ü ß
Filter Finds a a, A, ä, Ä o o, O, ö, Ö u u, U, ü, Ü A - - - - <-- O - - - - <-- U - - - - <-- ß ß ss -
Find Finds a a, A, ä, Ä o o, O, ö, Ö u u, U, ü, Ü A a, A, ä, Ä O o, O, ö, Ö U u, U, ü, Ü ß ß ss -
September 28, 2022 at 7:34 am #28389tuskaParticipantGerman umlauts and special characters (ä,ö,ü,Ä,Ö,Ü,ß)
– Addition –Text Fuzzy Matching ä ae, Ae Ä ae, Ae ö oe, Oe Ö oe, Oe ü ue, Ue Ü ue, Ue ß ss (already mentioned above by user spiros)
September 30, 2022 at 7:46 am #28393tuskaParticipant2Yutaka Emura
https://www.emeditor.com/forums/topic/diacritics-insensitive-search-filtering/#post-28388
Filter Finds A - - - - <-- O - - - - <-- U - - - - <--
This is fixed in beta 13 (21.9.913) – September 29, 2022 …
A a A ä Ä O o O ö Ö U u U ü Ü
Thank you!
Btw, ß (ss), ä (aede, deae, deaede, Aede, deAe, deAEde),etc.
is it currently ONLY allowed to set A SINGLE CHARACTER in the “Treat as” field?
If so, could you perhaps also allow multiple characters here? That would be very helpful!September 30, 2022 at 12:45 pm #28394spirosParticipantI think that a good way of handling these is providing an extra option for German variants as it is German-specific (and strictly speaking these are not versions of the same character with or without diacritics).
ä ae, Ae
Ä ae, Ae
ö oe, Oe
Ö oe, Oe
ü ue, Ue
Ü ue, Ue
ß ssSeptember 30, 2022 at 12:53 pm #28395Yutaka EmuraKeymasterv21.9.914 allows you to specify a string as well as a character range.
Thanks, - AuthorPosts
- You must be logged in to reply to this topic.