[ruby] Split email body by newlines
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

My Rails app processes incoming emails by splitting them into multiple lines. This is what I currently use on the plain text version of the body: lines = email.body.split("\n")

This works well unless the sentences are longer than ~74 characters as most email clients will automatically add a line break per RFC 2822.

Example email: https://gist.github.com/marckohlbrugge/39c17b928eb17d330d63

Looking at the plain text part there seems to be no way to discern between a line break added by the user versus the email client. You could ignore any line break happening at the 75th position, but I think there might be a chance of false positives. (I could be wrong.)

The HTML part has all the information we need, but I'm not sure about a universal way to process this. Is replacing every div and br with a newline and then stripping all other HTML elements enough? What about all the other block-element tags? What about inline elements styled as block-elements? What if an email doesn't have an HTML part?

I did find some interesting code examples in this StackOverflow post, but replacing a list of html tags with newlines doesn't seem like a complete (exhaustive) solution.

Deliverables

Write a Ruby class that allow the splitting of lines based on any Mail object ( https://github.com/mikel/mail ) and returns an array of all the different lines. Regardless of whether it's plain text only, HTML only, or both. Keep the aforementioned concerns in mind. Please include tests with different kind of emails. (short emails, long emails, html-only, text-only, mixed, etc.)

I know this already expired, but maybe this would be of some help to you? I just saw it on hacker news. http://parser.zapier.com/?welcome=back
akshatpradhan over 5 years ago
Thanks for the suggestion, but I prefer not to use an external service for this task.
marc over 5 years ago

Crowdsource coding tasks.

0 Solutions