Welcome to the Power Users community on Codidact!
Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.
Post History
I have some text files which think they are encoded in UTF-8: file test.txt test.txt: Unicode text, UTF-8 text, with CRLF line terminators However if I look at their content, I think they migh...
#2: Post edited
Determine encoding of text
I have some text files which think they are encoded in utf8:- ```
- file test.txt
- test.txt: Unicode text, UTF-8 text, with CRLF line terminators
- ```
(https://github.com/samcarter/shared/blob/main/test.txt )- However if I look at their content, I think they might in reality have some other encoding:
- ```
- ÒHi there. IÕm a test documentÓ
- ÒTouchŽ.Ó
- ```
- From context, this should read as
- ```
- “Hi there. I'm a test document”
- “Touché.”
- ```
- How can I determine the original encoding of the text so that I can re-encode the file with `iconv` to hopefully get a readable text?
- I have some text files which think they are encoded in UTF-8:
- ```
- file test.txt
- test.txt: Unicode text, UTF-8 text, with CRLF line terminators
- ```
- However if I look at their content, I think they might in reality have some other encoding:
- ```
- ÒHi there. IÕm a test documentÓ
- ÒTouchŽ.Ó
- ```
- From context, this should read as
- ```
- “Hi there. I'm a test document”
- “Touché.”
- ```
- How can I determine the original encoding of the text so that I can re-encode the file with `iconv` to hopefully get a readable text?
#1: Initial revision
Determine encoding of text
I have some text files which think they are encoded in utf8: ``` file test.txt test.txt: Unicode text, UTF-8 text, with CRLF line terminators ``` (https://github.com/samcarter/shared/blob/main/test.txt ) However if I look at their content, I think they might in reality have some other encoding: ``` ÒHi there. IÕm a test documentÓ ÒTouchŽ.Ó ``` From context, this should read as ``` “Hi there. I'm a test document” “Touché.” ``` How can I determine the original encoding of the text so that I can re-encode the file with `iconv` to hopefully get a readable text?