Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to the Power Users community on Codidact!

Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.

Post History

71%
+3 −0
Q&A Determine encoding of text

If your goal is to fix your files like David Cary's iconving does, but you can't tell the mis-encodings that transpired to create your text, you can use a little Python and the ftfy library[1] as f...

posted 1y ago by Michael‭  ·  edited 1y ago by Michael‭

Answer
#3: Post edited by user avatar Michael‭ · 2023-10-24T19:45:57Z (about 1 year ago)
Commentary
  • If your goal is to fix your files like [David Cary's `iconv`ing does][davidcary], but you _can't tell_ the mis-encodings that transpired to create your text, you can use a little Python and [the `ftfy` library][ftfy][^1] as [found in PyPI][pip] to undo the mess.
  • > ## Some quick examples
  • > Here are some examples (found in the real world) of what ftfy can do:
  • >
  • > ftfy can fix mojibake (encoding mix-ups), by detecting patterns of characters that were clearly meant to be UTF-8 but were decoded as something else:
  • >
  • > ```py
  • > >>> import ftfy
  • > >>> ftfy.fix_text('✔ No problems')
  • > '✔ No problems'
  • > ```
  • > Does this sound impossible? It’s really not. UTF-8 is a well-designed encoding that makes it obvious when it’s being misused, and a string of mojibake usually contains all the information we need to recover the original string.
  • >
  • > ftfy can fix multiple layers of mojibake simultaneously:
  • >
  • > ```py
  • > >>> ftfy.fix_text('The Mona Lisa doesn’t have eyebrows.')
  • > "The Mona Lisa doesn't have eyebrows."
  • > ```
  • [^1]: "Fixed that for you"
  • [davidcary]: https://powerusers.codidact.com/posts/289529/289602#answer-289602
  • [ftfy]: https://ftfy.readthedocs.io/en/latest/
  • [pip]: https://pypi.org/project/ftfy/
  • If your goal is to fix your files like [David Cary's `iconv`ing][davidcary] does, but you _can't tell_ the mis-encodings that transpired to create your text, you can use a little Python and [the `ftfy` library][ftfy][^1] as [found in PyPI][pip] to undo the mess.
  • > ## Some quick examples
  • > Here are some examples (found in the real world) of what ftfy can do:
  • >
  • > ftfy can fix mojibake (encoding mix-ups), by detecting patterns of characters that were clearly meant to be UTF-8 but were decoded as something else:
  • >
  • > ```python
  • > >>> import ftfy
  • > >>> ftfy.fix_text('✔ No problems')
  • > '✔ No problems'
  • > ```
  • > Does this sound impossible? It’s really not. UTF-8 is a well-designed encoding that makes it obvious when it’s being misused, and a string of mojibake usually contains all the information we need to recover the original string.
  • >
  • > ftfy can fix multiple layers of mojibake simultaneously:
  • >
  • > ```python
  • > >>> ftfy.fix_text('The Mona Lisa doesn’t have eyebrows.')
  • > "The Mona Lisa doesn't have eyebrows."
  • > ```
  • I learned about `ftfy` several years after I wrote some (much less rigorous) tools to detect and unscramble content that had made its way through one or more different encodings.
  • [^1]: "Fixed that for you"
  • [davidcary]: https://powerusers.codidact.com/posts/289529/289602#answer-289602
  • [ftfy]: https://ftfy.readthedocs.io/en/latest/
  • [pip]: https://pypi.org/project/ftfy/
#2: Post edited by user avatar Michael‭ · 2023-10-24T16:11:37Z (about 1 year ago)
Link David's answer
  • If your goal is to fix your files like David Cary's `iconv`ing does, but you _can't tell_ the mis-encodings that transpired to create your text, you can use a little Python and the [`ftfy`][ftfy] library[^1] [in PyPi][pip] to undo the mess.
  • > ## Some quick examples
  • > Here are some examples (found in the real world) of what ftfy can do:
  • >
  • > ftfy can fix mojibake (encoding mix-ups), by detecting patterns of characters that were clearly meant to be UTF-8 but were decoded as something else:
  • >
  • > ```py
  • > >>> import ftfy
  • > >>> ftfy.fix_text('✔ No problems')
  • > '✔ No problems'
  • > ```
  • > Does this sound impossible? It’s really not. UTF-8 is a well-designed encoding that makes it obvious when it’s being misused, and a string of mojibake usually contains all the information we need to recover the original string.
  • >
  • > ftfy can fix multiple layers of mojibake simultaneously:
  • >
  • > ```py
  • > >>> ftfy.fix_text('The Mona Lisa doesn’t have eyebrows.')
  • > "The Mona Lisa doesn't have eyebrows."
  • > ```
  • [^1]: "Fixed that for you"
  • [ftfy]: https://ftfy.readthedocs.io/en/latest/
  • [pip]: https://pypi.org/project/ftfy/
  • If your goal is to fix your files like [David Cary's `iconv`ing does][davidcary], but you _can't tell_ the mis-encodings that transpired to create your text, you can use a little Python and [the `ftfy` library][ftfy][^1] as [found in PyPI][pip] to undo the mess.
  • > ## Some quick examples
  • > Here are some examples (found in the real world) of what ftfy can do:
  • >
  • > ftfy can fix mojibake (encoding mix-ups), by detecting patterns of characters that were clearly meant to be UTF-8 but were decoded as something else:
  • >
  • > ```py
  • > >>> import ftfy
  • > >>> ftfy.fix_text('✔ No problems')
  • > '✔ No problems'
  • > ```
  • > Does this sound impossible? It’s really not. UTF-8 is a well-designed encoding that makes it obvious when it’s being misused, and a string of mojibake usually contains all the information we need to recover the original string.
  • >
  • > ftfy can fix multiple layers of mojibake simultaneously:
  • >
  • > ```py
  • > >>> ftfy.fix_text('The Mona Lisa doesn’t have eyebrows.')
  • > "The Mona Lisa doesn't have eyebrows."
  • > ```
  • [^1]: "Fixed that for you"
  • [davidcary]: https://powerusers.codidact.com/posts/289529/289602#answer-289602
  • [ftfy]: https://ftfy.readthedocs.io/en/latest/
  • [pip]: https://pypi.org/project/ftfy/
#1: Initial revision by user avatar Michael‭ · 2023-10-20T20:55:39Z (over 1 year ago)
If your goal is to fix your files like David Cary's `iconv`ing does, but you _can't tell_ the mis-encodings that transpired to create your text, you can use a little Python and the [`ftfy`][ftfy] library[^1] [in PyPi][pip] to undo the mess.

> ## Some quick examples
> Here are some examples (found in the real world) of what ftfy can do:
> 
> ftfy can fix mojibake (encoding mix-ups), by detecting patterns of characters that were clearly meant to be UTF-8 but were decoded as something else:
>
> ```py
> >>> import ftfy
> >>> ftfy.fix_text('✔ No problems')
> '✔ No problems'
> ```
> Does this sound impossible? It’s really not. UTF-8 is a well-designed encoding that makes it obvious when it’s being misused, and a string of mojibake usually contains all the information we need to recover the original string.
> 
> ftfy can fix multiple layers of mojibake simultaneously:
>
> ```py
> >>> ftfy.fix_text('The Mona Lisa doesn’t have eyebrows.')
> "The Mona Lisa doesn't have eyebrows."
> ```

[^1]: "Fixed that for you"

[ftfy]: https://ftfy.readthedocs.io/en/latest/
[pip]: https://pypi.org/project/ftfy/