rake task to create seed file from spreadsheet
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

Hi, I'd like to take this field

from the worksheet called: Prioritized Approach Milestones
https://www.pcisecuritystandards.org/documents/Prioritized_Approach_for_PCI_DSS_v20.xls

and turn it into a seed file.
https://github.com/akshatpradhan/compliance_chimp/blob/master/db/seeds.rb

I was thinking about using this gem
https://github.com/zdavatz/spreadsheet

A PR would be great!

I feel my question may not be asked right so I’m hoping for some questions to help me clarify my question.

Thank you!

Should 1.1 be a separate standard than 1.1.1 even though they're in the same cell?
sguha over 5 years ago
Yes if you can separate 1.1.1 into its own name field, that would be great! Also for each standard, if you would be able to retain newline endings that would be great too.
akshatpradhan over 5 years ago
this can be done easily without gems if you export the xls to a csv. Otherwise, there are several gems. I think you can learn some tricks with an approach "from scratch"
kc00l over 5 years ago
I didn't forget about this! Do you think a tree based approach for requirements could be relevant? To explain better: requirement 1.1.1 is a child of requirement 1.1 which is a child of requirement 1. With a tree structure the user could get track of the completion of bigger features by providing proofs for sub-requirements. What do you think?
kc00l over 5 years ago
hey, no, don't worry about trees and sub requirements. 1.1 has its own and 1.1.1 has its own. The reason for that is because the folks who made PCI requirements clearly don't have an understanding of what sub requirements really mean.
akshatpradhan over 5 years ago
Also, I'm still working/thinking about the hidden_field thing. I'm asking one more person for their suggestion before trying to implement it.
akshatpradhan over 5 years ago
Ok, I'll just clean the list up and write the import from csv script to the database seed, then.
kc00l over 5 years ago
awarded to kc00l

Crowdsource coding tasks.

1 Solution

Winning solution
Tipped

Here's my pull request https://github.com/akshatpradhan/compliance_chimp/pull/19

I admit this was a bit long and I stumbled upon an annoying issue in parsing new lines from the csv file to the requirements collection. Anyway, here's how I did it.

  1. I first exported the excel file column containing the requirements name & description to a new file
  2. I cleaned the file up (removed first order requirements 1, 2, etc., split cells to separate requirements)
  3. I exported the file to a csv
  4. I opened the csv in Vim to clean it some more:
    1. to separate the single column to a "requirement" and a "description" column by using the awesome search&replace with Regex builtin functions (some were really complicated :%s:^\(\d\+\.\d\+\.\d\+\) :"\1",":gc and even more)
    2. removing some utf-8 wrong encoded characters (trial and error when doing rake db:seed, see below)
    3. adding markdown syntax to reproduce lists, newlines, etc. (http://daringfireball.net/projects/markdown/syntax)
  5. copied the csv in 'csv/' Rails app root
  6. added some lines in db/seeds.rb to parse the csv and create new requirement records from each row:

    require "csv"
    puts "importing Requirements table from csv"
    CSV.foreach("csv/PCI_requirements.csv", headers: true) do |row|
      puts "processing requirement #{row["requirement"]}"
      record = Requirement.new(name: row["requirement"], description: row["description"].gsub(/\\n/, "\n"))
      record.save!
    end
    

    One of the trickiest parts was when I discovered the new lines characters I added in the csv (\n) were somehow escaped (I couldn't figure whether it was the CSV library or Mongoid...) and I had to unescape the requirement description field before saving the record.

  7. The last needed part was adding a markdown renderer (I chose the newest Redcarpet) to the Gemfile, creating a view helper to initialize the renderer and adding this to requirements/index and users/show (some refactoring is needed here :-) )

Let me know if something isn't clear for you!