Load mysqldump CSVs into Ruby FasterCSV Apr 11

Working with mysqldump CSV files using the FasterCSV is a pain. mysqldump converts NULL fields into \N and it uses backslashes (‘\’) to escape characters. The only character in a well formed CSV that you need to escape is the double quote (‘"’). You can change the mysqldump escape characters and force double quotes around the fields, but FasterCSV::MalformedCsvError errors still occur.

After multiple tries I looked to shell scripting to get this working. Modifying large files this way wont be fast, but I do this with scripts that run in the background so I don’t mind the time it takes.

First thing is to dump your table(s) into CSV formats. The following mysqldump command outputs tab separated fields with no surrounding quotes.

  # Creates a "/path/to/store/csvs/your_table_here.txt" file
  $ mysqldump -u<user> \
  > -p<password> \
  > -t --tab=/path/to/store/csvs/ your_db_here your_table_here

Since the only double quote in the file belongs to the content, I can escape them with sed. At the same time, I like to change \N to NULL. This sed command does the trick.

  # Edit in place
  $ sed -i \
  > -e 's/"/""/g' \
  > -e 's/\t[[:space:]]*\\N/\tNULL/g' \
  > /path/to/store/csvs/your_table_here.txt

Now that everything is properly escaped I double quote the fields and replace the tabs with commas.

  # Edit in place
  $ sed -i \
  > -e 's/\t/","/g' \
  > -e 's/^\(.*\)$/"\1"/g' \
  > /path/to/store/csvs/your_table_here.txt

You can combine both of those steps into one sed command, but I like having them separate in my scripts. It’s easier to digest.

Now you’re ready to use that file with FasterCSV (or CSV if you’re on ruby 1.9) with the default settings!

  FasterCSV.open('/path/to/store/csvs/your_table_here.txt') do |row|
    puts row[0], row[1] # ...
  end

I like to use a converter to convert those NULL fields to nils.

  FasterCSV.open(
    '/path/to/store/csvs/your_table_here.txt', 
    :converters => lambda{|f| f == "NULL" ? nil : f}
  ) do |row|
    puts row[0], row[1] # ...
  end