Javascript highlight string/text match based on lookup table
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

We have a need, for regulatory reasons, to highlight terms and phrase in our website content. We will highlight them visually and add a link which will open a new page. On hover of the highlight we will show some text. The link and the hover text are unique per term/phrase.

Couple of thoughts. In the js we should have some highly optimized dictionary structure.

String, hover text, url.

The js can handle the highlight style so we don’t need to worry about another css file.

The html parsing logic should be pretty fast so it can log. Content quickly without a ton of lag.

We should have a way in the js to keep the parsing focused on just the body area of the content. I’m thinking maybe the best way to handle this is to scan the page for a CLASS on a DIV and the scanning can start there and stay focused inside that div.

One thing to know we have pages like this with dynamic loading of next articles. We would need the parser to run again as the next article loads dynamically. https://www.fastcompany.com/90316217/lights-cacti-store-freezers-and-other-random-things-watching-us-now

This will end up in an angularjs site so if we could limit libs and code to exclude jquery it would be appreciated.

awarded to Wuddrum
Tags
javascript

Crowdsource coding tasks.

3 Solutions


You can try mark.js (https://markjs.io/) or use this function:

/**
 * Highlight keywords inside a DOM element
 * @param {string} elem Element to search for keywords in
 * @param {string[]} keywords Keywords to highlight
 * @param {boolean} caseSensitive Differenciate between capital and lowercase letters
 * @param {string} cls Class to apply to the highlighted keyword
 */
function highlight(elem, keywords, caseSensitive = false, cls = 'highlight') {
  const flags = caseSensitive ? 'gi' : 'g';
  // Sort longer matches first to avoid
  // highlighting keywords within keywords.
  keywords.sort((a, b) => b.length - a.length);
  Array.from(elem.childNodes).forEach(child => {
    const keywordRegex = RegExp(keywords.join('|'), flags);
    if (child.nodeType !== 3) { // not a text node
      highlight(child, keywords, caseSensitive, cls);
    } else if (keywordRegex.test(child.textContent)) {
      const frag = document.createDocumentFragment();
      let lastIdx = 0;
      child.textContent.replace(keywordRegex, (match, idx) => {
        const part = document.createTextNode(child.textContent.slice(lastIdx, idx));
        const highlighted = document.createElement('span');
        highlighted.textContent = match;
        highlighted.classList.add(cls);
        frag.appendChild(part);
        frag.appendChild(highlighted);
        lastIdx = idx + match.length;
      });
      const end = document.createTextNode(child.textContent.slice(lastIdx));
      frag.appendChild(end);
      child.parentNode.replaceChild(frag, child);
    }
  });
}

Highlight all keywords found in the page:


highlight(document.body, ['lorem', 'ipsum', 'dol']);

Also you'll need to add to css:


.highlight {
background: lightpink;
}


Here is some code I've come up with in plain JavaScript:

You can check a CodePen demo here: https://codepen.io/kostasx/pen/OqBKgP?editors=0010

The core functionality is inside the searchAndReplace() function.

There is also a code section which dynamically adds a paragraph after 5 seconds, containing some terms.

By executing the searchAndReplace() function, the highlighter also works on that content too.

I've used the .highlighted to add some custom CSS and the title attribute for the hover text for now.

Let me know if you want me to dig deeper and work on the code some more.

    <!DOCTYPE html>
    <html lang="en">

    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <meta http-equiv="X-UA-Compatible" content="ie=edge">
        <title>Highlight</title>

        <!-- Loader Styling -->
        <style>
            .lds-ripple {
                position: relative;
                text-align: center;
                margin: 60px auto 0 auto;
                width: 64px;
                height: 64px;
            }
            .lds-ripple div {
                position: absolute;
                border: 4px solid #000;
                opacity: 1;
                border-radius: 50%;
                animation: lds-ripple 1s cubic-bezier(0, 0.2, 0.8, 1) infinite;
            }
            .lds-ripple div:nth-child(2) {
                animation-delay: -0.5s;
            }
            @keyframes lds-ripple {
                0% {
                    top: 28px;
                    left: 28px;
                    width: 0;
                    height: 0;
                    opacity: 1;
                }
                100% {
                    top: -1px;
                    left: -1px;
                    width: 58px;
                    height: 58px;
                    opacity: 0;
                }
            }

        </style>

        <!-- Styling -->
        <style>
            body {
                margin-top: 50px;
                font-family: Arial, Helvetica, sans-serif;
            }

            .container {
                margin: 0 auto;
                max-width: 960px;
                font-size: 1rem;
                line-height: 2;
            }

            .highlighted {
                background: indigo;
                text-decoration: none;
                color: white;
                padding: 6px;
            }
        </style>

    </head>

    <body>

        <div class="container">

            <p>We have a need, for regulatory reasons, to highlight terms and phrase in our website content. We will
                highlight them visually and add a link which will open a new page. On hover of the highlight we will show
                some text. Example of term: Facebook. The link and the hover text are unique per term/phrase.</p>

            <p>Couple of thoughts. In the js we should have some highly optimized dictionary structure.</p>

            <p>String, hover text, url. Another term here: Google. And some other text.</p>

            <p>The js can handle the highlight style so we don’t need to worry about another css file.</p>

            <p>The html parsing logic should be pretty fast so it can log. Content quickly without a ton of lag. An already highlighted term here: <a href="#" class="highlighted">Google</a></p>

            <p>One thing to know we have pages like this with dynamic loading of next articles. Yet, another term at this
                point: Microsoft. We would need the parser to run again as the next article loads dynamically.</p>

            <!-- Loader -->
            <div class="lds-ripple"><div></div><div></div></div>

        </div>

        <script>
            // Terms: Term + hover Text + URL
            let terms = {
                "Facebook": {
                    hover: "The Facebook Company",
                    link: "https://facebook.com"
                },
                "Google": {
                    hover: "The Google Company",
                    link: "https://google.com"
                },
                "Microsoft": {
                    hover: "The Microsoft Company",
                    link: "https://microsoft.com"
                }
            }

            // Find all Text Nodes inside another Node:
            function findTextNodes( node ) {
                let all = [];
                for ( node = node.firstChild; node; node = node.nextSibling ) {
                    if ( node.nodeType === 3 ) {
                        all.push( node );
                    } else { 
                        all = all.concat( findTextNodes( node ) ); 
                    }
                }
                return all;
            }

            // Function that checks if a Text Node contains a term
            function doesTextNodeContainsTerm( terms, targets, textNode  ){

                Object.keys( terms ).forEach( function( term ){

                    // Check if ParentNode is <script>
                    let isParentNodeScript = textNode.parentNode.nodeName === "SCRIPT";

                    let containsTerm = textNode.textContent.indexOf( term ) > -1;
                    // Check if Text Node contains the term and it is not a <script> Tag:
                    if ( containsTerm && !isParentNodeScript ){
                        let parentEl = textNode.parentElement;
                        let isParentElementHighlighted = parentEl.classList.contains("highlighted");
                        if ( !isParentElementHighlighted ){
                            targets.push( [ parentEl, term, terms[term] ] );
                        }
                    }

                });
            }

            // Map over elements that contain a term and add a highlighted class and place the term inside an anchor Tag.
            function formatTerm( entry ){

                let $parentEl = entry[0];
                let parentElHTML = entry[0].innerHTML;
                let term = entry[1];
                $parentEl.innerHTML = parentElHTML.replace( term, `<a href="${entry[2].link}" title="${entry[2].hover}" class="highlighted" target="_blank">${term}</a>` ); 

            }

            function searchAndReplace( terms ){

                // Find all Text Nodes and place them in a variable
                let textNodes = findTextNodes( document.body );
                let targets = [];
                // Loop over Text Nodes and check to see if they contain a term
                textNodes.forEach( doesTextNodeContainsTerm.bind( null, terms, targets ) );
                // Loop over Text Nodes that contain a term and format them appropriately
                targets.map( formatTerm );

            }

            // Initialize our code when the DOM Content has been loaded:
            document.addEventListener( "DOMContentLoaded", function(){
                searchAndReplace( terms );
            });

            // Dynamically add content after 5 seconds and run the searchAndReplace() command to find terms in the newly added elements:
            setTimeout( function(){

                let $p = document.createElement( "p" );
                $p.textContent = "We should have a way in the js to keep the parsing focused on just the body area of the content. Another occurence of the Facebook term here and Google here. I'm thinking maybe the best way to handle this is to scan the page for a CLASS on a DIV and the scanning can start there and stay focused inside that div."
                document.querySelector(".lds-ripple").remove();
                document.querySelector(".container").appendChild( $p );

                searchAndReplace( terms );

            }, 5000 );


        </script>


    </body>

    </html>
This is a great solution, do you think we could work in some tooltip like this? https://kazzkiq.github.io/balloon.css/ ?
Qdev 3 months ago
I've updated the codepen with CSS styling from the reference link. You can check it out and let me know.
kostasx 3 months ago
Winning solution

Hey, here's my approach:

Demo: https://wuddrum.github.io/js-text-highlighter/

Source: https://github.com/Wuddrum/js-text-highlighter

Description

This approach is using MutationObserver that allows scanning for highlightable text only in freshly inserted nodes. You can open the demo and scroll endlessly (or until your browser can't handle that much content), without the highlighter script ever slowing down, because it never rescans the whole document, instead scanning only the nodes that have been just inserted.

While this works in all the latest browsers, including IE11, mind that it doesn't work in IE10.

Usage of the highlighter

var highlighter = new Highlighter(classToObserve,insertedClassWhitelist, textContainerClassWhitelist, highlights)

classToObserve: Class of the general element that holds any dynamically added content. e.g. container

insertedClassWhitelist: An array of classes for the elements that get dynamically inserted. This is used as a first measure to not highlight unwanted text. Example: ['article']

textContainerClassWhitelist: An array of classes for the elements that hold highlightable text. This is used as a second measure to not highlight unwanted text. Example: ['article-body']

highlights: And array of highlight objects(arrays), that hold the information about highlights. Example: [ ['Some Company', 'A popup about some company', 'http://google.com'] ]

Afterwards execute highlighter.run() and you're all set.

Edit: Did a bunch of improvements to the code. Removed balloon.js dependency in favor of a custom tooltip css code that's based around balloon.js' appearance. Required css is now injected from js, so not additional links to .css files are needed anymore.

View Timeline