I was looking for a way to easily determine a file's charset when I stumbled upon the Universal Encoding Detector. Just wanted to share it.
Installation:
$ wget http://chardet.feedparser.org/download/chardet-1.0.1.tgz -O - | tar xz $ cd chardet-1.0.1 $ python ./setup.py build $ sudo python ./setup.py install
Usage:
From a python console:
>>> import chardet
>>> chardet.detect(open('/path/to/your/file', 'r').read())
{'confidence': 0.98999999999999999, 'encoding': 'utf-8'}
Nice !