Simple ruby substring
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I have a test string:
AAATCAGAAGTTAAGGCAGTGTTTTAGATGGCTCATTCACAACTATCTTTCCCCTTTAAATATGATTTATTGTCTYRTCTYATACACAGATGTATTGCTTGGTAAAAGAYTRGCCTCCARTCAAACCTGARCAGGCTATGGAACTTCTGGACTGTAATTACCCAGATCCTATGGTTCGARGTTTTGCTGTTCRRGTGCTTGGAAAAATATTTAACWGATRACAAACTTTCTCAGTATTTAATTCARCTAGTACARGtaaaataatgtaaaatagyraataatrtttaattacaataataattta

I need a ruby method:

  • It will accept 1 parameter - the string itself
  • It will parse the string and look for any letters which are not ACTG (case insensitive)
  • It will return an array of arrays of non-ACTG letters
  • It should efficient, as strings can be 2MB or longer in length

eg:
[[4,1],[12,1],[140,2]....] <- note these locations do not match the string above

awarded to vanceza
Tags
ruby

Crowdsource coding tasks.

2 Solutions

Winning solution

Here's a simple answer using regular expressions in Ruby. I tested it using Ruby 2.1.1, but it should work in previous versions.

def findBadChars str
        badChars = []
        pos = 0
        while m = str.match(/[^ACTG]+/i, pos) do
            beg, pos = m.offset(0)
            length = pos - beg
            badChars << [beg, length]
        end
        badChars
end

puts (findBadChars "AAATCAGAAGTTAAGGCAGTGTTTTAGATGGCTCATTCACAACTATCTTTCCCCTTTAAATATGATTTATTGTCTYRTCTYATACACAGATGTATTGCTTGGTAAAAGAYTRGCCTCCARTCAAACCTGARCAGGCTATGGAACTTCTGGACTGTAATTACCCAGATCCTATGGTTCGARGTTTTGCTGTTCRRGTGCTTGGAAAAATATTTAACWGATRACAAACTTTCTCAGTATTTAATTCARCTAGTACARGtaaaataatgtaaaatagyraataatrtttaattacaataataattta")
Thanks for your answer, I just checked the requirements and I left something out. The return value should actually be a single space separated string of values, eg: 4,1 12,1 140,2
kusadasi 5 years ago

I think this should work fine: https://gist.github.com/alv-c/54e945dd98cd41784bcb

Basically it's @vanceza solution so I would not be mad if you don't give me the bounty :)

Sorry but I have to award it to @vanceza... but FYI I had problems getting your solution to work, ended up doing this: badChars.each do |elem| self.log(elem) output += "#{elem[0]},#{elem[1]} " end
kusadasi 5 years ago
no problem, its fine :)). Whoaa, I can't believe I did just put "beg, length" hahah. I'm a php lazy
alv-c 5 years ago
No it's cool... and thanks for your answer. Cheers!
kusadasi 5 years ago
Yet another solution: https://gist.github.com/poserg/5ed5dacdfdc683060d07
poserg 5 years ago
I like this latest one ... more Rubyish
kusadasi 5 years ago
I like this latest one ... more Rubyish
kusadasi 5 years ago
View Timeline