Write a Python regex that detects all-caps phrases
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

The goal here is to write a Python regex that matches phrases of 3 or more words in all caps that don't start at the beginning of a line.

Here are some test cases:

THIS LINE HAS NO MATCHES BECAUSE THE ALL-CAPS PHRASE STARTS THE LINE.
This LINE has NO MATCHES because THE longest all-caps phrase is ONLY 2 words long
This line has no matches because it has no all-caps phrases.

This line has one match BECAUSE IT HAS SHOUTING.
This line has ONE MATCH because IT HAS ONLY ONE PHRASE with enough words.
THIS LINE HAS only one match because THE OTHER PHRASE starts the sentence.

awarded to weslly

Crowdsource coding tasks.

3 Solutions


.+([a-z]\b){3,}

I'm using my phone, so I didn't tested. Maybe it works.

Winning solution

import re

teststring = """
THIS LINE HAS NO MATCHES BECAUSE THE ALL-CAPS PHRASE STARTS THE LINE.
This LINE has NO MATCHES because THE longest all-caps phrase is ONLY 2 words long
This line has no matches because it has no all-caps phrases.

This line has one match BECAUSE IT HAS SHOUTING.
This line has ONE MATCH because IT HAS ONLY ONE PHRASE with enough words.
THIS LINE HAS only one match because THE OTHER PHRASE starts the sentence.
"""

matches = re.findall(r'[^A-Z]\b((\s[A-Z]+){3,})', teststring, re.DOTALL)

for match in matches:
    print match[0][1:]

# BECAUSE IT HAS SHOUTING
# IT HAS ONLY ONE PHRASE
# THE OTHER PHRASE

import re
import pprint

strings = [
    "THIS LINE HAS NO MATCHES BECAUSE THE ALL-CAPS PHRASE STARTS THE LINE",
    "This LINE has NO MATCHES because THE longest all-caps phrase is ONLY 2 words long",
    "This line has no matches because it has no all-caps phrases",
    "This line has one match BECAUSE IT HAS SHOUTING.",
    "This line has ONE MATCH because IT HAS ONLY ONE PHRASE with enough words.",
    "THIS LINE HAS only one match because THE OTHER PHRASE starts the sentence."
]

for s in strings:
    print "> ", s
    m = re.search(r'(?<=[a-z\s\W])((?:\s[A-Z\W]+){3,})', s)
    if m:
        pprint.pprint(m.groups())

Output:

sh-4.2# python main.py
>  THIS LINE HAS NO MATCHES BECAUSE THE ALL-CAPS PHRASE STARTS THE LINE
>  This LINE has NO MATCHES because THE longest all-caps phrase is ONLY 2 words long
>  This line has no matches because it has no all-caps phrases
>  This line has one match BECAUSE IT HAS SHOUTING.
(' BECAUSE IT HAS SHOUTING.',)
>  This line has ONE MATCH because IT HAS ONLY ONE PHRASE with enough words.
(' IT HAS ONLY ONE PHRASE ',)
>  THIS LINE HAS only one match because THE OTHER PHRASE starts the sentence.
(' THE OTHER PHRASE ',)
View Timeline