Tried searching so many posts but unable to get the answer. Below is my script where I am trying to perform 'sed' operation by writing a program.
import sys
def sed(pattern, replace, source, dest):
fin = open(source, 'r')
fout = open(dest, 'w')
for line in fin:
line = line.replace('\x00', '')
line = line.replace(pattern, replace)
print(line)
fout.write(line)
fin.close()
fout.close()
def main(name):
pattern = 'to be'
replace = 'is'
source = 'C:\....\input.txt'
dest = 'C:\...\output.txt'
sed(pattern, replace, source, dest)
if __name__ == '__main__':
main(*sys.argv)
I am reading data from an input text file, replacing the string and writing the complete string along with replaced one to an output text file.
I am able to see the replaced string in 'print(line)' but when I check the output.txt, it shows some chinese kind of texts.
Please let me know how to get the same data in output text file.
I believe that you are using Python 2, not Python 3. Your input file is encoded as UTF16, but the default file encoding is being used. This which is why you have extra null characters (\x00) that you remove.
The output file is then written with the UTF-16 byte order mark (BOM) (0xFF 0xFE) as the first 2 bytes but, because the null bytes were removed the value of each 2 byte UTF16 character is altered. That's why it appears as Asian text when you view it. For example :
>>> b'to'.decode('utf16')
u'\u6f74'
>>> print(b'to'.decode('utf16'))
潴
One solution is to use Python 3 and supply the encoding argument when opening the files:
fin = open(source, 'r', encoding='utf16')
fout = open(dest, 'w', encoding='utr16')
If you must use Python 2 use io.open() to open the files:
import io
fin = io.open(source, 'r', encoding='utf16')
fout = io.open(dest, 'w', encoding='utf16')
In either case you should use with to ensure that the files will be properly closed in the case that an exception occurs:
def sed(pattern, replace, source, dest, encoding='utf16'):
with open(source, 'r', encoding=encoding) as fin:
with open(dest, 'w', encoding=encoding) as fout:
for line in fin:
line = line.replace(pattern, replace)
fout.write(line)
You don't need to close the files since they will be automatically closed when the with goes out of scope, in this case when sed() returns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With