Tagged: separate byte tsv csv column
- AuthorPosts
- January 6, 2014 at 1:18 am #17804no1Participant
Hi all,
Is there any way to separate data by bytes?
Bytes! Not characters. A non-English character may be more than one byte.
And my source data is byte-based. So I have to do this.The numbers of bytes of each column:
8,50,30,30,30,50,50,50,50,30,15,8,1,6,25,25,55,55,55,55,12,30,11,3,30,24,55,55,55,55,12,30,11,3,30,55,55,55,55,3,30,30,10,3,1,1,1,30,1,10,4,1,6,3,8,10,6,30,10,6,6,1,1,1,1,1,1,1,1,30,30,30,1,8,8,8,8,8,8,8,8,8,8,8,8,8,1,10,3,30,4,5,7,7,4,3,30,5,19,19,19,19,20,7,8,9,19,19,8,10,30,50,5,35,3,30,3,30,4,30,4,30,25,4,50,3,30,3,30,30,40,30,40,30,40,30,40,30,40,30,40,6,8,50,50,50,5,25,6,6,70,20,1,5,5,3,10,30,25,5,30,8,3,3,8,8,3,5,3,8,8,6,8,2,2,8,12,3,2,8,1,5,5,1Make it TSV or CSV. (Separate with tabs or commas)
Or any other tool?
TIA!
January 6, 2014 at 4:56 pm #17812Yutaka EmuraKeymasterHello,
How about opening a file as Binary (ASCII View)?
January 6, 2014 at 9:49 pm #17817no1ParticipantBut I need a macro or something…
There are about 200 columns. And the file is large.January 8, 2014 at 11:16 am #17837StefanParticipantI do not understand what you are after, no1.
Can you explain in more detail? Best with before/after examples and which rules to attend.
BEFORE:
xxxxxx
AFTER;
xx; xx; xx;
Stefan
January 10, 2014 at 6:39 pm #17843no1ParticipantI’ll explain more with some pictures later.
Currently please read this:
http://www.ultraedit.com/forums/viewtopic.php?f=2&t=14711Thank you all!
January 14, 2014 at 1:25 pm #17849no1ParticipantLet me give an example with some pictures:
The bytes are continuous in the file. To make it clear in the picture, I broke the stream at 0D0A.
The data fields are fixed length in bytes. The red Solid lines are where I want to insert the delimiter bytes.
It would be best if there is a way to insert the delimiter bytes into the byte stream directly. But I don’t know such a way (which is what I exactly want). So currently I have to open the files with a text editor and handle the text by characters.
The data fields are fixed length in bytes. But the number of characters could be different. (I highlighted some corresponding bytes and characters with different colors.)
UltraEdit’s Convert to Character Delimited command can only handle the text by characters. (And I don’t know if regular expressions can handle by bytes.) So I insert ! next to the multi-byte characters according to the numbers of the bytes.
Now the number of characters is equal to the number of the original bytes, which lets UltraEdit’s Convert to Character Delimited command insert the delimiters to the right positions.Any better solutions/tools are welcome.
The example text and its Hex(UTF-8):
(Column Width: 8,30,15,10,13 bytes)Field 1 Field 2 ăĕĭŏŭ âêîôû Field3(15bytes)Field 4 Field 5, etc. 123 Any unicode string without tab诸如此类 Field 5, etc. 12345678[---This field is 30 bytes---]エトセトラ[10 bytes]Field 5, etc. 4669656C642031204669656C64203220C483C495C4ADC58FC5AD20C3A2C3AAC3AEC3B4C3BB204669656C64332831356279746573294669656C6420342020204669656C6420352C206574632E0D0A3132332020202020416E7920756E69636F646520737472696E6720776974686F757420746162E8AFB8E5A682E6ADA4E7B1BB202020202020202020202020204669656C6420352C206574632E0D0A31323334353637385B2D2D2D54686973206669656C642069732033302062797465732D2D2D5DE382A8E38388E382BBE38388E383A95B31302062797465735D4669656C6420352C206574632E0D0A
By the way, about the picture above, if someone would be interested:
All the colorful highlightings and lines in the text are done within EmEditor, not by an image editor.Thank you, Yutaka, for the new features.
The User-Defined Guides are useful.January 14, 2014 at 1:36 pm #17853no1ParticipantDrag and drop the picture to see the original size of it.
January 14, 2014 at 4:30 pm #17854Yutaka EmuraKeymasterHi no1,
After you open the file as Binary (ASCII View), you can record a macro. For example, when the cursor is at the left top corner, you can click the “Record All Except Mouse/Keyboard Activities” button on the toolbar, and record your keystroke like this:
Press RIGHT 8 times, COMMA (,), RIGHT 30 times, COMMA (,), RIGHT 15 times, COMMA (,), RIGHT 10 times, COMMA (,), RIGHT 13 times, … , HOME, DOWN.
Then stop recording the macro. This will record a macro like this (in case of JavaScript):
document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.Text=","; document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.Text=","; document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.Text=","; document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.Text=","; document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.CharRight(false,1); document.selection.Text=","; document.selection.StartOfLine(false,eeLineView | eeLineHomeText); document.selection.LineDown(false,1);
Then you can choose Run with Temporary Options on the Macros menu, and enter the number of lines as the “Repeat Count”. After you have done this, save the file, and open the file and switch to the CSV mode. I hope this helps.
January 16, 2014 at 10:30 am #17872no1ParticipantThank you, Yutaka.
My files are many and large (thousands of lines or even more per file, about 200 columns per line, the column widths variable). So I’m afraid operation macros might not fit my case.
I wrote a script macro with the help of others. Could you (or anyone) take a look to see if there’s anything to be amended in it? And see if the regular expressions could be optimized. (After I tested a few, I think the shorter, the faster.)
The files are large. So the efficiency and stability should be considered.Currently the only but fatal problem is:
The regular expression ^ would match the position after this character:
U+0085 <control> : NEXT LINE
How to resolve it?if(document.Encoding != eeEncodingBinary) { alert("TSV ASCII\n\nUse this macro in Binary (ASCII View) mode only!"); Quit(); }; var str = "8,50,30,30,30,50,50,50,50,30,15,8,1,6,25,25,55,55,55,55,12,30,11,3,30,24,55,55,55,55,12,30,11,3,30,55,55,55,55,3,30,30,10,3,1,1,1,30,1,10,4,1,6,3,8,10,6,30,10,6,6,1,1,1,1,1,1,1,1,30,30,30,1,8,8,8,8,8,8,8,8,8,8,8,8,8,1,10,3,30,4,5,7,7,4,3,30,5,19,19,19,19,20,7,8,9,19,19,8,10,30,50,5,35,3,30,3,30,4,30,4,30,25,4,50,3,30,3,30,30,40,30,40,30,40,30,40,30,40,30,40,6,8,50,50,50,5,25,6,6,70,20,1,5,5,3,10,30,25,5,30,8,3,3,8,8,3,5,3,8,8,6,8,2,2,8,12,3,2,8,1,5,5,1"; str = prompt("[TSV ASCII] Enter width of each column: (e.g. 10,5,1,7)", str); var arr = str.split(","); Redraw = false; var n = 0 for(var i = 0; i < arr.length - 1; i++) { n = n + parseInt(arr[i]); document.selection.Replace("^.{" + n + "}", "\\t", eeReplaceAll | eeFindReplaceRegExp); n = n + 1; //Monitor/Break: document.HighlightFind = false; Redraw = true; if(!confirm("Monitor/Break:\n\n" + i + " " + arr[i] + " " + n + " done.\n\nContinue?")) Quit(); Redraw = false; }; document.selection.Replace(" +\t", "\t", eeReplaceAll | eeFindReplaceRegExp);
- AuthorPosts
- You must be logged in to reply to this topic.