ASCII text, with very long lines, with CRLF line terminators
Однако, когда я прохожу csvcut -e ASCII datafile, я получаю:
Your file is not "utf-8" encoded. Please specify the correct encoding with the -e flag. Use the -v flag to see the complete error.
и когда я прохожу csvcut -e ASCII datafile, я получаю:
Your file is not "ASCII" encoded. Please specify the correct encoding with the -e flag.
(Ни заглавные буквы, ни копирование-вставка точного fileвывода не улучшают это.)
Полная ошибка ( -v) выглядит следующим образом:
Traceback (most recent call last): File "/usr/local/bin/csvcut", line 9, in <module> load_entry_point('csvkit==0.9.2', 'console_scripts', 'csvcut')() File "/usr/local/lib/python2.7/dist-packages/csvkit-0.9.2-py2.7.egg/csvkit/utilities/csvcut.py", line 64, in launch_new_instance utility.main() File "/usr/local/lib/python2.7/dist-packages/csvkit-0.9.2-py2.7.egg/csvkit/utilities/csvcut.py", line 53, in main for row in rows: File "/usr/local/lib/python2.7/dist-packages/csvkit-0.9.2-py2.7.egg/csvkit/unicsv.py", line 51, in next row = next(self.reader) File "/usr/local/lib/python2.7/dist-packages/six.py", line 535, in next return type(self).__next__(self) File "/usr/local/lib/python2.7/dist-packages/csvkit-0.9.2-py2.7.egg/csvkit/unicsv.py", line 35, in __next__ return next(self.reader).encode('utf-8') File "/usr/lib/python2.7/codecs.py", line 615, in next line = self.readline() File "/usr/lib/python2.7/codecs.py", line 530, in readline data = self.read(readsize, firstline=True) File "/usr/lib/python2.7/codecs.py", line 477, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
1 ответ на вопрос
0
4ae1e1
Your payload is neither ASCII nor UTF-8 encoded. You can quickly find the non-ASCII bits:
awk '/[^\x00-\x7F]/{ print NR ":", $0 }' data.csv | less
You'll see things like Briarcliffe College�??Patchogue in a UTF-8 encoded terminal emulator, suggesting that this is not a UTF-8 encoded file. And the first guess of encoding? ISO 8859-1, Western European. Let's test:
# piping to /dev/null to suppress printing and speed up processing (printing to tty is slow) csvcut -e iso-8859-1 data.csv >/dev/null