2010-09-04, 04:34 AM
markbb1 Wrote:in hex (I think). It might be that when the sed script takes off the first instance on the line, the remaining characters then appear to our text renderers as extended ascii instead of Chinese. Removing all instances might be the magic required.
sed works character by character, the chinese utf-8 3 byte codes are all 0x80 or greater which is why this in theory will work. If the file is somehow converted to utf-16 or unicode all bets are off
Martin