Welcome to the Power Users community on Codidact!
Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.
Post History
If your goal is to fix your files like David Cary's iconving does, but you can't tell the mis-encodings that transpired to create your text, you can use a little Python and the ftfy library[1] as f...
Answer
#3: Post edited
If your goal is to fix your files like [David Cary's `iconv`ing does][davidcary], but you _can't tell_ the mis-encodings that transpired to create your text, you can use a little Python and [the `ftfy` library][ftfy][^1] as [found in PyPI][pip] to undo the mess.- > ## Some quick examples
- > Here are some examples (found in the real world) of what ftfy can do:
- >
- > ftfy can fix mojibake (encoding mix-ups), by detecting patterns of characters that were clearly meant to be UTF-8 but were decoded as something else:
- >
> ```py- > >>> import ftfy
- > >>> ftfy.fix_text('✔ No problems')
- > '✔ No problems'
- > ```
- > Does this sound impossible? It’s really not. UTF-8 is a well-designed encoding that makes it obvious when it’s being misused, and a string of mojibake usually contains all the information we need to recover the original string.
- >
- > ftfy can fix multiple layers of mojibake simultaneously:
- >
> ```py- > >>> ftfy.fix_text('The Mona Lisa doesn’t have eyebrows.')
- > "The Mona Lisa doesn't have eyebrows."
- > ```
- [^1]: "Fixed that for you"
- [davidcary]: https://powerusers.codidact.com/posts/289529/289602#answer-289602
- [ftfy]: https://ftfy.readthedocs.io/en/latest/
- [pip]: https://pypi.org/project/ftfy/
- If your goal is to fix your files like [David Cary's `iconv`ing][davidcary] does, but you _can't tell_ the mis-encodings that transpired to create your text, you can use a little Python and [the `ftfy` library][ftfy][^1] as [found in PyPI][pip] to undo the mess.
- > ## Some quick examples
- > Here are some examples (found in the real world) of what ftfy can do:
- >
- > ftfy can fix mojibake (encoding mix-ups), by detecting patterns of characters that were clearly meant to be UTF-8 but were decoded as something else:
- >
- > ```python
- > >>> import ftfy
- > >>> ftfy.fix_text('✔ No problems')
- > '✔ No problems'
- > ```
- > Does this sound impossible? It’s really not. UTF-8 is a well-designed encoding that makes it obvious when it’s being misused, and a string of mojibake usually contains all the information we need to recover the original string.
- >
- > ftfy can fix multiple layers of mojibake simultaneously:
- >
- > ```python
- > >>> ftfy.fix_text('The Mona Lisa doesn’t have eyebrows.')
- > "The Mona Lisa doesn't have eyebrows."
- > ```
- I learned about `ftfy` several years after I wrote some (much less rigorous) tools to detect and unscramble content that had made its way through one or more different encodings.
- [^1]: "Fixed that for you"
- [davidcary]: https://powerusers.codidact.com/posts/289529/289602#answer-289602
- [ftfy]: https://ftfy.readthedocs.io/en/latest/
- [pip]: https://pypi.org/project/ftfy/
#2: Post edited
If your goal is to fix your files like David Cary's `iconv`ing does, but you _can't tell_ the mis-encodings that transpired to create your text, you can use a little Python and the [`ftfy`][ftfy] library[^1] [in PyPi][pip] to undo the mess.- > ## Some quick examples
- > Here are some examples (found in the real world) of what ftfy can do:
- >
- > ftfy can fix mojibake (encoding mix-ups), by detecting patterns of characters that were clearly meant to be UTF-8 but were decoded as something else:
- >
- > ```py
- > >>> import ftfy
- > >>> ftfy.fix_text('✔ No problems')
- > '✔ No problems'
- > ```
- > Does this sound impossible? It’s really not. UTF-8 is a well-designed encoding that makes it obvious when it’s being misused, and a string of mojibake usually contains all the information we need to recover the original string.
- >
- > ftfy can fix multiple layers of mojibake simultaneously:
- >
- > ```py
- > >>> ftfy.fix_text('The Mona Lisa doesn’t have eyebrows.')
- > "The Mona Lisa doesn't have eyebrows."
- > ```
- [^1]: "Fixed that for you"
- [ftfy]: https://ftfy.readthedocs.io/en/latest/
- [pip]: https://pypi.org/project/ftfy/
- If your goal is to fix your files like [David Cary's `iconv`ing does][davidcary], but you _can't tell_ the mis-encodings that transpired to create your text, you can use a little Python and [the `ftfy` library][ftfy][^1] as [found in PyPI][pip] to undo the mess.
- > ## Some quick examples
- > Here are some examples (found in the real world) of what ftfy can do:
- >
- > ftfy can fix mojibake (encoding mix-ups), by detecting patterns of characters that were clearly meant to be UTF-8 but were decoded as something else:
- >
- > ```py
- > >>> import ftfy
- > >>> ftfy.fix_text('✔ No problems')
- > '✔ No problems'
- > ```
- > Does this sound impossible? It’s really not. UTF-8 is a well-designed encoding that makes it obvious when it’s being misused, and a string of mojibake usually contains all the information we need to recover the original string.
- >
- > ftfy can fix multiple layers of mojibake simultaneously:
- >
- > ```py
- > >>> ftfy.fix_text('The Mona Lisa doesn’t have eyebrows.')
- > "The Mona Lisa doesn't have eyebrows."
- > ```
- [^1]: "Fixed that for you"
- [davidcary]: https://powerusers.codidact.com/posts/289529/289602#answer-289602
- [ftfy]: https://ftfy.readthedocs.io/en/latest/
- [pip]: https://pypi.org/project/ftfy/
#1: Initial revision
If your goal is to fix your files like David Cary's `iconv`ing does, but you _can't tell_ the mis-encodings that transpired to create your text, you can use a little Python and the [`ftfy`][ftfy] library[^1] [in PyPi][pip] to undo the mess. > ## Some quick examples > Here are some examples (found in the real world) of what ftfy can do: > > ftfy can fix mojibake (encoding mix-ups), by detecting patterns of characters that were clearly meant to be UTF-8 but were decoded as something else: > > ```py > >>> import ftfy > >>> ftfy.fix_text('✔ No problems') > '✔ No problems' > ``` > Does this sound impossible? It’s really not. UTF-8 is a well-designed encoding that makes it obvious when it’s being misused, and a string of mojibake usually contains all the information we need to recover the original string. > > ftfy can fix multiple layers of mojibake simultaneously: > > ```py > >>> ftfy.fix_text('The Mona Lisa doesn’t have eyebrows.') > "The Mona Lisa doesn't have eyebrows." > ``` [^1]: "Fixed that for you" [ftfy]: https://ftfy.readthedocs.io/en/latest/ [pip]: https://pypi.org/project/ftfy/