Welcome to the Power Users community on Codidact!
Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.
Post History
quick fix I agree with Canina that you need to do two translations to fix this problem. Fortunately, it appears that you can recover the original text without loss. Try this: # first convert f...
Answer
#1: Initial revision
### quick fix I agree with Canina that you need to do *two* translations to fix this problem. Fortunately, it appears that you can recover the original text without loss. Try this: ``` # first convert from UTF-8 to WINDOWS-1252 iconv -f UTF-8 -t WINDOWS-1252 < test.txt > junk.txt # next re-interpret the text as "MAC OS Roman" # and convert back to UTF-8 iconv -f MACINTOSH -t UTF-8 < junk.txt > output.txt ``` ### details I've had the same thing happen to curly quotes in my files when trying to read text files I created on my old Macintosh that were mis-interpreted as ISO-8859-1 or ISO-8859-15 text. Other options would work just as well to fix the curly quotes, since several different character encodings happen to put the curly quotes in the same place, such as ``` # first convert from UTF-8 to ISO-8859-15 iconv -f UTF-8 -t ISO-8859-15 < test.txt > junk.txt # next re-interpret the text as "MAC OS Roman" # and convert back to UTF-8 iconv -f MACINTOSH -t UTF-8 < junk.txt > output.txt ``` which was the solution for my text, but would mess up other letters in your particular text. I used https://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing) to figure out which 2 character sets had the same byte value representing "Z with caron" in one set and "e with acute" in the other set, etc. Fortunately I saw there that WINDOWS-1252 lines up with other letters in that text, translating C5 BD ( U+017D "Z with caron" ) to 8E, where the byte 8E when re-interpreted as "MAC OS Roman" represents "e with acute" (U+00E9 in Unicode). (I feel that using named [HTML character entity references](https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references) are often a better way to represent characters than ambiguous raw binary codes, and would have prevented such problems. ).