Skip to main content

Charset normaliser

charset_normaliser is a python package that allows you to read from a file when you don’t know what the encoding is going to be.

Not unrelated: The midata initiative allows consumers to download transactional data in a consistent CSV format. Santander chose to make their implementation a semicolon-separated csv file using the cp1250 (Windows-1250) encoding. Imagine the state of the systems involved in generating that file…