EmEditor v22.0.0 released (including technical review)!

Today, we are releasing EmEditor v22.0.0.

A major feature of EmEditor v22.0 is Fuzzy Matching, which is the ability to search, filter, and join CSV using approximate string matching, which is customizable in the Professional version. This feature was requested by several customers (1, 2, 3). The feature includes several options in EmEditor Professional while only the Match similar strings option is available in EmEditor Free. The Match similar strings option uses a string metric called Levenshtein distance or edit distance to calculate how similar two strings are. In other words, EmEditor compares two strings and counts how many steps you would need to edit from one string to reach the other string. For instance, if two strings are:

"fuzzx maching" and "fuzzy matching"

The fifth character “x” of the first string must be substituted for a “y”, and a “t” must be inserted before the 9th character “c”. Thus, the edit distance between these two strings becomes 2.

If you select the Fuzzy Matching check box in the Find dialog box of EmEditor Professional, the Fuzzy Matching Options dialog box appears, where you can use the Similarity level and Max edit distance options to determine how similar is a match. For instance, if the Similarity level is 3/4 (75%), up to the edit distance of 1 in every 4 characters is allowed to match, but not exceed the edit distance specified in the Max edit distance option. In other words, up to 1 edit distance is allowed to match if the length of a string is 4 or more, and up to 2 edit distance is allowed if the length of a string is 8 or more.

In EmEditor Professional, there are more options available in the Fuzzy Matching feature. All of the following options can be used without setting the aforementioned Match similar strings option. The Ignore nonspacing combining characters, such as diacritics, dakuten, and handakuten option is especially useful if you would like to ignore diacritics, dakuten, handakuten, and other nonspacing combining characters (except Emoji sequences which will be covered by another option). For instance, the option matches

"e" with "é"
"c" with "ç"
"ハ" with "パ"

When this option is selected, EmEditor applies Unicode Normalization Form D (Canonical Decomposition) to both strings before comparison and ignores nonspacing combining characters while comparing the strings.

The Ignore Emoji sequences option ignores Emoji sequences except the first code value of the sequence. For instance:

"👨‍🦰" (red hair man, U+1F468 U+200D U+1F9B0) 
and
"👨‍🦳" (white hair man, U+1F468 U+200D U+1F9B3) 

will not be differentiated.

The String/Character ranges option allows maximum flexibility in defining how similar strings are. For instance, if you specify a hyphen “-” to be treated as a space ” “, the following two strings will not be differentiated.

"fuzzy-matching" and "fuzzy matching"

If you specify an ampersand “&” to be ignored, the following two strings will not be differentiated.

"fuzzy" and "fu&zzy"

You may also specify a character range by selecting a Unicode script, Unicode General Category, minimum and maximum character code values, or a combination of these. For instance, if you specify Unicode General Categories “Pc,Pd,Pe,Pf,Pi,Po,Ps” to be ignored, all punctuations in strings are ignored. Thus, the following two strings will not be differentiated.

"Emurasoft, Inc." and "Emurasoft Inc"

There are more options available in the Fuzzy Matching Options dialog box. Please see the Help for more details.

If you use the Fuzzy Matching option in the Find or Filter command, the fuzzily matched but not identical strings are distinguished by blue wiggly underlines. A future version of EmEditor will display a tooltip to allow you to copy or fix these fuzzy strings if you hover the mouse pointer over these blue wiggly underlines.

You may use the Fuzzy Matching option in Find, Replace, Find in Files, Replace in Files, Batch Find, Batch Replace, Batch Find in Files, Batch Replace in Files, and Join CSV dialog boxes, and Find and Filter toolbars. The Fuzzy Matching Options are currently global app settings and shared through all these dialog boxes and toolbars.

If you use the Fuzzy Matching option in the Join CSV dialog box, you will be able to join two CSV documents with approximately matched strings. Suppose you have two CSV documents:

ID    Company
1     Emurasoft, Inc.
2     Microsoft Corporation
3     Apple Inc.
State    Company
CA       Apple Inc
WA       Microsoft Corp.
WA       Emurasoft Inc

You want to join these two CSV with the Company name. Previous versions of EmEditor could not join them correctly because “Emurasoft, Inc.” did not match “Emurasoft Inc”. The fuzzy matching allows you to specify punctuations to be ignored, and treat “Corp” as “Corporation”. Thus, the result will become:

ID    Company                 State   Company
1     Emurasoft, Inc.         WA      Emurasoft Inc
2     Microsoft Corporation   WA      Microsoft Corp.
3     Apple Inc.              CA      Apple Inc

You can also apply the Fuzzy Matching option to all search strings defined in the Batch Find or Batch Replace dialog box. To set or clear the Fuzzy Matching option to all the batch items, select all items in the batch list, right-click to display a menu, where you can toggle the Fuzzy Matching option. However, the Fuzzy Matching option slows down the search speed significantly if you have many search strings or a document is very large.

Other features of v22.0 include the ability to highlight MIME Encoded Words (Base64) used in email message headers, which was requested by a customer. For instance, if a message header contains the following lines:

Subject: =?UTF-8?B?W0VtRWRpdG9yICjjg4bjgq3jgrnjg4jjgqjjg4fjgqPjgr8pXQ==?=
 =?UTF-8?B?IOOCqOODs+OCs+ODvOODieOBruWumue+qeOBq+aXouWumuWIhui/veWKoA==?=

EmEditor will highlight these lines, and display a tooltip to allow you to reveal or copy the original string if you hover the mouse pointer over the header.

A customer asked for the ability to customize how a string in the Clipboard should be pasted. If you copy a string from a vertical selection and paste it to plain text, the result may not be exactly what you expect. In this case, click on the Clipboard icon which appears when you’ve pasted, then the Clipboard History window will appear. Right-clicking on the string you’ve just pasted will bring up a context menu, where you will be able to select the Insert as Characters, Insert as Lines, Insert as Vertical, or Insert as Cells command.

The default Main menu was redesigned again to include the Insert, Convert, Bookmarks, Sort, and Plug-ins popup menu items at the top. If your keyboard doesn’t include keys specifically used for diacritical characters, you might find the Diacritics submenu in the Insert menu useful when you need to type these characters without memorizing the corresponding shortcut keys. I hope you like these changes if you use the default Main menu, but you can always customize the menu by selecting Customize Menus on the Tools menu if you don’t like the default menu.

v22.0 supported Unicode 15.0. For instance, the following characters are new Emoji characters added to Unicode 15.0.

🫨 U+1FAE8 (SHAKING FACE)
🩷 U+1FA77 (PINK HEART)
🫎 U+1FACE (MOOSE)
🛜 U+1F6DC (WIRELESS)

While a font supporting Unicode 15.0 is necessary to display these above characters correctly, copying and pasting them into EmEditor, and using the Character Code Value command (Ctrl+I) while placing the cursor at the left side of each character will display its correct Unicode Name. The update affects the Unicode Name, Unicode Script, and Unicode General Category displayed by the Character Code Value command. It also affects the width of characters determined by East Asian Width and the Character Check feature. However, the update does NOT affect the Onigmo regular expression engine, which is currently still based on a previous version of Unicode.

Finally, the CommitList (Git) plug-in was updated by adding the ability to compare branches, and other improvements to make the plug-in easier to use.

I hope you like EmEditor, whether you use the Professional or Free version. Please contact us or write in forums if you have any questions, feature requests, or any ideas in the future.)

Thank you for using EmEditor!
Yutaka Emura

Please see EmEditor v22.0 New Features for details and screenshots.

This release also includes all bug fixes while developing v22.0.

If you use the Desktop Installer version, you can select Check for Updates on the Help to download the newest version. If this method fails, please download the newest version, and run the downloaded installer. If you use the Desktop portable version, you can go to the Download page to download the newest version. The Store App versions can be updated through Microsoft Store (64-bit or 32-bit) after a few days. If you use winget, you can type “winget install emeditor” to install the latest version of EmEditor (64-bit or 32-bit detected automatically).