- AuthorPosts
- September 29, 2006 at 3:27 pm #3834userParticipant
hello
how do I delete duplicate lines?
I mean lines that are identical
thanks
September 29, 2006 at 7:56 pm #3835Yutaka EmuraKeymasterIt isn’t easy to do with regular expressions, but how about a macro like this (JavaScript):
// Create an array
a = new Array();
// Fill the array a with all lines (with returns) in the document.
document.selection.StartOfDocument();
for( ; ; ){
y = document.selection.GetActivePointY( eePosLogical );
document.selection.SelectLine();
sLine = document.selection.Text;
if( sLine == "" ) { // Reached the end of document, escape from the loop
break;
}
a.push( sLine );
document.selection.Collapse();
if( document.selection.GetActivePointY( eePosLogical ) == y ) {
// Reached the end of document (the last line without return), escape from the loop
break;
}
}
// Delete duplicate elements.
for( i = 0; i < a.length; i++ ){
sLine = a[i];
for( j = i + 1; j < a.length; j++ ){
if( sLine == a[j] ){
a.splice( j, 1 );
j--;
}
}
}
// Replace the entire document with new elements
document.selection.SelectAll();
document.selection.Text = a.join( "" );
Please let me know if you have questions. :-)
September 29, 2006 at 8:42 pm #3836userParticipantthank you for your reply
my congrats and best wishes with the new forums :)
April 18, 2009 at 9:10 am #7153HelladosMemberThis is a good macros for small files, but it is very slow for me :-(
I have more 50-100mb txt files, and i need to replace dublicate lines (words) more then 406000 words and this macro working very slow :(
my pc’s performances is very good, I have Intel COre 2 Duo E8400 2GB ram corsair 1TB HDD
What can i do?
:-(April 18, 2009 at 9:26 pm #7156Yutaka EmuraKeymasterHellados wrote:
This is a good macros for small files, but it is very slow for me :-(
I have more 50-100mb txt files, and i need to replace dublicate lines (words) more then 406000 words and this macro working very slow :(
my pc’s performances is very good, I have Intel COre 2 Duo E8400 2GB ram corsair 1TB HDD
What can i do?
:-(I did some optimization. Please try this. This also shows the current status on the status bar.
function Pair( i, s )
{
this.index = i;
this.str = s;
}
nLines = document.GetLines();
// Create an array
a = new Array( nLines );
status = "Reading lines..."
// Fill the array a with all lines (with returns) in the document.
for( i = 1; i <= nLines; i++ ) {
if( (i \% 1000) == 0 ){
status = "Reading lines: " + String(i + 1) + "/" + String(nLines);
}
var pair = new Pair( i, document.GetLine( i, eeGetLineWithNewLines ) );
a.push( pair );
}
status = "Sorting lines..."
a.sort( function(a,b){
if( a.str > b.str ){
return 1;
}
if( a.str < b.str ){
return -1;
}
return a.index - b.index;
});
// Delete duplicate elements.
for( i = 1; i < nLines; i++ ){
if( (i \% 10) == 0 ){
status = "Deleting duplicate lines: " + String(i + 1) + "/" + String(nLines);
}
if( a[i].str == a[i-1].str ){
a[i].index = 0; // disable
}
}
status = "Sorting lines again..."
a.sort( function(a,b){
return a.index - b.index;
});
var str = "";
for( i = 0; i < nLines; i++ ){
if( a[i].index != 0 ){
if( (i \% 1000) == 0 ){
status = "Joining lines: " + String(i + 1) + "/" + String(nLines);
}
str += a[i].str;
}
}
// Replace the entire document with new elements
document.selection.SelectAll();
document.selection.Text = str;
status = "Duplicate lines deleteded."
January 3, 2010 at 7:33 pm #7994SalabimParticipantHi Yutaka,
regarding the last (faster) duplicate line macro you posted, is it possible to change the code so that the last line…
status = "Duplicate lines deleteded."
… could actually show how many duplicate lines were deleted ?
Something like :
“117 duplicate lines deleted.”January 5, 2010 at 6:51 am #8009Yutaka EmuraKeymasterThen how about this?
function Pair( i, s )
{
this.index = i;
this.str = s;
}
nLines = document.GetLines();
// Create an array
a = new Array( nLines );
status = "Reading lines..."
// Fill the array a with all lines (with returns) in the document.
for( i = 1; i <= nLines; i++ ) {
if( (i \% 1000) == 0 ){
status = "Reading lines: " + String(i + 1) + "/" + String(nLines);
}
var pair = new Pair( i, document.GetLine( i, eeGetLineWithNewLines ) );
a.push( pair );
}
status = "Sorting lines..."
a.sort( function(a,b){
if( a.str > b.str ){
return 1;
}
if( a.str < b.str ){
return -1;
}
return a.index - b.index;
});
// Delete duplicate elements.
for( i = 1; i < nLines; i++ ){
if( (i \% 10) == 0 ){
status = "Deleting duplicate lines: " + String(i + 1) + "/" + String(nLines);
}
if( a[i].str == a[i-1].str ){
a[i].index = 0; // disable
}
}
status = "Sorting lines again..."
a.sort( function(a,b){
return a.index - b.index;
});
var str = "";
n = 0;
for( i = 0; i < nLines; i++ ){
if( a[i].index != 0 ){
if( (i \% 1000) == 0 ){
status = "Joining lines: " + String(i + 1) + "/" + String(nLines);
}
str += a[i].str;
}
else {
n++;
}
}
// Replace the entire document with new elements
document.selection.SelectAll();
document.selection.Text = str;
status = n + " duplicate lines deleteded."
January 6, 2010 at 8:07 am #8015SalabimParticipantThanks a lot Yutaka ! :)
June 5, 2011 at 8:47 pm #9409MonkeymanMemberThank you for good macro. Removing duplicate lines is very nice and useful feature, which EmEditor lacks badly. I hope you’ll add it in future release.
As for JS macro provided, it has one small “glitch”. When duplicate line is the last one this macro doesn’t recognize it. For example:
Badger
Eagle
Simpsons
Donkey
BadgerThere’s no new line after second “Badger”, so it won’t delete it.
July 28, 2011 at 9:36 am #9517raikriveraMemberThank you sooooo much guys which i was searching for it.
August 10, 2011 at 8:21 pm #9531DeipotentParticipantI needed this functionality recently and thought it could be done easily with regex. Google led me to http://www.regular-expressions.info/duplicatelines.html which said to sort the lines, and then search for the following:
^(.*)(r?n1)+$
and replace with:
1
Unfortunately, I couldn’t get this to work in EmEditor, even after enabling the option to search past line boundaries.
Can you add support for this type of regex to EmEditor ?
PS. I haven’t tried the macro yet, and am sure it works fine, but it would be nice if it could be done with regex.
August 10, 2011 at 9:11 pm #9536Yutaka EmuraKeymasterIn EmEditor r is ignored, so you should try using this?
^(.*)(n1)+$
In future versions, I might add a new command to remove duplicate lines, so you won’t need to use regular expression replace in the future.
Thanks!
August 10, 2011 at 11:50 pm #9541DeipotentParticipantThanks Yutaka, that did the trick!
A new command would be useful, particularly for people who don’t scripting or RegEx, as a lot of other Text Editors include a simple menu option for removing duplicates. It would also allow you to highly optimise it, and also include an option to keep the original order (ie. so you don’t have to sort it first), which would be useful.
One of my other suggestions was for a simple RegEx library feature, so you can create or import Regex’s with a name, optional description, and possibly find settings. This would allow you to select the relevant regex from the library list and then run it.
August 11, 2011 at 12:37 am #9544Yutaka EmuraKeymasterWe are working on such those features in future versions.
Thanks!
- AuthorPosts
- You must be logged in to reply to this topic.