Saturday, May 12, 2012

Converting MAC “CR” Line Endings to UNIX “LF”


If your file loads in looking something like this:


it is because you’ve saved it using a MAC formatted file structure, probably using MAC Excel, as MACOS 10.x uses UNIX style line endings.  If you then open this file under VIM running in Linux and VIM is not set to automatically detect it, you get this mess.   This happens because old-school MACs used CR line endings, and Unix expects LF type line endings.

Most UNIX based utilities and scripting languages do not know how to interpret a CR type line ending.  If you try to parse the CSV file with any UNIX based utility or scripting language, you are going to have trouble because the function is going to treat the entire file as a single line.  Most PHP file handling functions also have trouble with it.

So you need to convert the line endings.  There are numerous ways to do this such as using TR or Perl:

tr '\n' '\r' < mac_formatted _file.csv

perl -ne 's/([^\r])\r/$1\n/g; s/\r//g; print;'  mac_format.csv

But the easiest way I have found is to just use the VIM editor. 

First, check your current file format settings with a:   
:set ffs? 

If you are having trouble reading MAC formatted files, you will likely find this produces a:
Fileformats=unix,dos

A simple execution of:
:set ffs=unix,dos,mac

will generally easily fix this.   Exit the file and open it again, and you will now be able to see your text correctly.  However, you haven’t actually converted it.  You’ve just taught VIM how to automatically understand the CR type line endings.

VIM can be in any one of three file editing modes: dos, unix, or mac.  VIM attempts to figure out the correct mode when it loads the file.  But if the file format settings (ffs) do not contain the mac entry, VIM does not know how to read MAC files.

You can convert the line endings by first switching the editor into MAC mode so it can understand the file, switching the file format over to UNIX, and then writing out the converted file:
  1. :e ++ff=MAC
  2. :setlocal ff=unix
  3. :w 

The VIM “:e ++ff=mac” command switches VIM over to look for ‘\r’ (Carriage Return or CR) characters as line endings.  This will show up as “^M” in the file if the file is being interpreted as a UNIX file.  Unix uses a pure ‘\n’ (Line Feed or LR) as the only line-ending indicator.  The UNIX LF character shows up as a “^J” if you switch the editor over to UNIX mode.  If you switch a UNIX file to DOS mode, it will display correctly, but it will show a display at the bottom saying “[CR missing][dos]” right after you execute the “:e ++ff=dos” command.

You can switch back and forth into any mode you want by executing:
  1. :e ++ff=current_format
  2. :setlocal ff=target_format
  3. :w

where “current_format” and “target_format” are:  dos, unix, or mac

No comments:

Post a Comment