This post is older than a year. Consider some information might not be accurate anymore.
Dealing with data in text files sometimes gives you headaches with file encoding. A tool that helps us overcome our problem is iconv
. iconv
converts text from one character encoding to another encoding.
Check Contents
Examine the file with binary mode in the editor vim
.
vim -b input.csv
If you detect problems with umlauts and Windows line ending ^M
like this:
"R<e4>fis, Stationsstrasse",Buchs (SG)^M
Convert it to Unix/Linux with dos2unix
. It is text file format converter from DOS/MAC to UNIX.
dos2unix input.csv
Character Sets
To list all supported character sets:
iconv -l
# iconv --list
Convert to Unicode
Convert contents of input.csv
from ISO-8859-1
to UTF-8
and write it to output.csv
.
iconv -f ISO-8859-1 -t UTF-8 input.csv -o output-UTF_8.csv
Another example ISO-8859-1
To UTF-16
iconv -f ISO-8859-1 -t UTF-16 input.csv -o output-UTF_16.csv