Recently I had downloaded a csv file with the intention of extracting some data to satisfy my curiousity about something. I wrote a little Perl script to slice and dice the data, and that would have been that - except I wanted to know something quickly from the original file, so I did something like "grep whatever 2003.csv".
I got back nothing.
That's odd, I thought. I know that "whatever" is in there. So I fired up vim and did "/whatever" and, sure enough, there it was.
So why couldn't I extract it with grep?
Hmm. Let's do a "more". Ooops! After warning my that "2003.csv" may be a binary file. See it anyway?, "more" showed me a mess.
Well, duh, that's why I couldn't grep from the file - the darn thing is utf-16!
So, what can you do if faced with this situation? You have a few choices. You could ask vim to rewrite it. That's easy:
Vim can do all sorts of file encoding rewriting; see Using another encoding in the VIM docs.
At the Terminal command line, you can use "iconv":
iconv -f utf-16 -t utf-8 2003.csv | grep whatever
Though that gets old fast, so I just converted the file.
Wouldn't it have been nice if we never had 7 or 8 bit encodings?
Had a client on a Red Hat system complain that he was getting an error from cron. A script in cron.weekly complained about "No such file or directory", but the file was there - it made no immediate sense.
The error seemed rather definite:
/etc/cron.weekly/procmail-users: /usr/bin/run-parts: /etc/cron.weekly/procmail-users: No such file or directory
I figured it was going to be a symlink problem or an incorrect shebang line, but no, everything looked fine, and you'd get the same error running it from the command line.
I kept looking and looking at this until I noticed that while editing it in vi, a little "[dos]" appeared next to the file name at the bottom of the screen.
Ahah. A "file proc*" confirmed that this had CRLF line endings. But normally I'd expect to see ^M's in vi; I didn't. That puzzles me a little, but since the script was just a one-liner, I removed and recreated it manually and now of course it works.
You can also do:
and then write the file, you'll convert dos or mac file endings to unix.
Of course there's :set ff=dos and :set ff=mac too.
You can be more verbose if you wish:
Nowadays, you may run into UTF-8 vs. UTF-16 problems too. See Converting File Encodings
Got something to add? Send me email.
More Articles by Anthony Lawrence © 2013-07-31 Anthony Lawrence