- AuthorPosts
- June 20, 2014 at 7:44 am #18566bbacherParticipant
I have a 580 MB text file, which represents about 130k lines of column-specific data – but all the data is on one line with no carriage returns. Sometimes I need to verify the format of the file, and the easiest way to do that is to break it down into 130k lines, and make sure all the columns line up properly. Currently, I’m using UltraEdit to do that, because it has a function to insert a character string after a specified number of bytes. I use that and insert a carriage return at 4,689 byte increments.
Does EmEditor have any similar functionality? I prefer EmEditor, but I’m stuck still using UltraEdit because I don’t know how to do that in EmEditor.
Thanks very much for any advice.
June 20, 2014 at 11:00 am #18567Yutaka EmuraKeymasterHello,
For this purpose, a macro is the best way to go. You can record a macro (Right 4689 times, and press Return), and then “Run with Temporary Options” with repeat count 130K or more.
June 20, 2014 at 11:11 am #18568bbacherParticipantThanks, I’ll try that.
June 20, 2014 at 12:50 pm #18576StefanParticipantRegEx could be a little bit slow (by its nature), but you may try this too:
Search & Replace
Find: (.{4689}\s)
Replace: \1\n
Explanation:
Find one piece of any sign (char, digit, sign) by the dot, and that at least 4689 times.
The \s ensures that we break at a whitespace only.
I don’t know why this works, as the normal syntax would be {4689, } ( {m,n}, so 4689 at least, but more possible till a whitespace is found)
But it works fine.Then we replace by what was matched into (…) back reference group and insert an line break. (\n is enough, even on DOS/Windows, see Help)
HTH?June 20, 2014 at 1:02 pm #18580bbacherParticipantThanks, I’ll try that also. Sometime next week I’ll do some performance tests and post back.
June 20, 2014 at 1:32 pm #18581StefanParticipantFor to create a macro you could do this:
– go to top of file
– go to start of line
– Macros > Start
– press right arrow key one time
– press Enter key
– Macro > Stop
– Macro > Edit (choose JavaScript)
– change ‘1’ to ‘4689’ for CharRight
– save file (to temporary file, (or to your EmEditor folder for later reuse)
– Macro > Select This
– go back to your document
– undo
– execute macro with temporary options– – –
To make the macro execution much more quicker, add this at top of the macro:
Redraw = false;
June 20, 2014 at 2:26 pm #18582StefanParticipantAH, and I forgot the official way:
– Tools > Properties for current Configuration
– General > Wrap by Chars : 4689
OK
To make this permanent by insert real line breaks:
– Select All
– Edit > Convert Selection > Split Lines
If often need, this could be also stored as macro and added as Menu button or executed by shortcut key.June 20, 2014 at 3:52 pm #18583bbacherParticipantThat’s brilliant! Thank you
June 23, 2014 at 11:08 am #18587bbacherParticipantThe macro method is very, very slow. It ran for about five minutes with no visible results – then I closed the program.
The RegEx Search/Replace searched to the end of the document and did nothing (did not find anything).June 23, 2014 at 11:13 am #18588bbacherParticipantI don’t MIND using UltraEdit for this process; I can continue doing that. The whole process runs in about 13 seconds, and produces 134,715 lines on today’s file.
I was just hoping EmEditor had a similar function that I hadn’t found.June 23, 2014 at 1:57 pm #18589StefanParticipantInteresting.
The above shown ways should do it. Well, at least with smaller files size.Unfortunately I can’t really test on my 32-bit system,… I have to many things running and not enough free memory for testing.
For an test I created a 581 MB text file with 130.000 x 4689 signs.
Open that in 32-bit EmEditor 14.4.0b2 shows this dialog:--------------------------- EmEditor --------------------------- This file contains a very long line. Very long lines will be split into several lines while the document is open, but will be recombined when saved. Highlighting very long lines is disabled, and some other features such as find and replace text containing CR or LF might not work correctly for very long lines. c:\temp\jsout.txt --------------------------- OK Abbrechen ---------------------------
and I ended up with 5 lines:
4x 134217729 chars
1x 072699089 chars“Wrap by Char” was disabled.
RegEx s&r allocated 1,2 GB and I canceled after 3 minutes.So I just quickly split the lines with SED.exe
c:\temp>sed -e “s/.\{4689\}/&\n/g” jsout.txt > jsoutSplitted.txtUnfortunately here I also run out of memory, but with a smaller file this works well.
So on a 64-bit system that should be no problem,…I will try tomorrow just out of curiosity.Code to create a test file (due to little memory I had to do a additional step as workaround)
Line=""; for(B=1, Bs=4689; B<=Bs; B++){ Line += "X"; } File=""; for(L=1, Ls=13000; L<=Ls; L++){ File += Line ; } fso = new ActiveXObject("Scripting.FileSystemObject"); oFile = fso.OpenTextFile("C:\\temp\\jsout.txt", 2, true) for(F=1; F<=10; F++) oFile.Write( File ); oFile.Close();
- AuthorPosts
- You must be logged in to reply to this topic.