CLI script to index all html files in directory to lunr.js compatible .json file
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I need a script which can be run in CLI and in a cronjob which will loop through every single HTML file recursively in a target directory, including sub directories and index them into a lunr.js compatible .json file.

Every single HTML file is pretty much set up the following way:

Then minimize/compres this .json file into as small as possible.

It seems lunr.js requires a URL for each json object. Like this:

url: '/path-of-html-file',

So I need the script to be able to get the full local path of the html and use that as the value for 'url':

Bonus: I will also willing to tip anyone that can tell me how to have lunr.js index this .json file so that I can use lunr.js search through this .json file.


hey, i saw the document that you want to index, but i'm not sure which fields do you want to index. do you want to have the whole body of the document, or only specific parts?
andijcr 4 years ago
Hey I want everything in the demo file
user0809 4 years ago
awarded to iurisilvio

Crowdsource coding tasks.

3 Solutions

As you can see in lunr.js site you have to set the fields you want to search through and set their value for each document (html page) in one json file.

To make things simpler there is this Jekyll plugin that automatically creates the lunar.js json file.

But how do you actually output display search results? All you get is a returned ID and a weight.
user0809 4 years ago
Also I have several separate jekyll instances running but combine the results into a single folder. Using the plugin would mean I will have to combine the lunar.js json file again.
user0809 4 years ago
Hey, so are you able to help me out with the script?
user0809 4 years ago
You should follow the steps mentioned in the github repo. I'm not sure I understand what you said about the multiple Jekyll instances. For the error you have to do gem install nokogiri json. I could write the script you asked but I'm not sure you 'll get the results you want with lunr.js
tomtoump 4 years ago
The problem is that even after I managed to get it to work, the outputs json file doesn't contain everything I need. It only takes the title + URL + body on the markdown files. I need everything in the front matter. As long as the script you write creates a json file from everything in the demo html file and which fits my problem the description, then I'll be happy.
user0809 4 years ago
Winning solution

I solved using lunr-index-build ( package. First, you have to install it: npm install -g lunr-index-build

You can download it here:

Run this command: python --input /your/input/directory --output index.json

It'll generate your lunr index in index.json.

so, i wrote this bash script, the only other requirement is to have node.js installed.
you can change the values of the lunr= variable and lunrIndexOutput= if you don't like my defaults,

save the script in a file called and invoke it with a directory to scan. for example bash /home/andrea/blog will scan my blog directory for html files, and write a lunr index file with the name index.json

to install the dipendencies install nodejs and npm via sudo apt-get install nodejs npm, set the correct registry with sudo npm config set registry then install lunr via sudo npm install lunr.


#if you have not installed the module lnr via npm, put the complete path of your lunr module here

#put the name of your output file here

dir=`readlink -f $1`


find $dir -type f -name "*.html" 2>/dev/null > $entriesFileNames


var lunr = require('$lunr')
var fs = require('fs')
//to account old version of nodejs
fs.existsSync = fs.existsSync || require('path').existsSync;

var idx = lunr(function () {

var entries=fs.readFileSync('$entriesFileNames')
var filenames=entries.toString().split('\n')

    var data=fs.readFileSync(file)
    idx.add({id: file, body: data.toString()})

fs.writeFile('$lunrIndexOutput', JSON.stringify(idx), function (err) {
    if (err) throw err
node -e "$script"
Okay let me try this
user0809 4 years ago
Btw, can this index.json file be minified to the smallest file size possible.
user0809 4 years ago
You mentioned in the first sentence 'the only other requirement is to have node.js installed'. But in the script I see '#if you have not installed the module lnr via npm, put the complete path of your lunr module here lunr="lunr"'. Do I need to install a 'lunr' module as well as node.js?
user0809 4 years ago
yes, you can install it via sudo npm install lunr. or you can download lunr.js from here and save it (remember to change the lunr= variable
andijcr 4 years ago
the index.json is already minified. further compression should be done by your webserver
andijcr 4 years ago
By 'lunr' module do you mean the lunr.js plugin? Or is this anothe module I need to install? I can't seem to find anything in Google about 'lunr module'
user0809 4 years ago
i edited the solution to help you install the dependencies
andijcr 4 years ago
I get 'npm ERR! Error: failed to fetch from registry: lunr' and other errors when trying to install the lunr module
user0809 4 years ago
probably you have installed an old version of npm. if you are in a server, it's the mosto probable cause. It's esasier to download lunr.min.js from , save it in the same folder of this script, and change the lunr= variable in lunr="./lunr.min.js"
andijcr 4 years ago
I copied over manually the lunr.min.js file and ran the script but I get: undefined:15 if(fs.existsSync(file)){ ^ TypeError: Object # has no method 'existsSync' at eval at (eval:1:82) at (native) at Object. (eval at (eval:1:82)) at Object. (eval:1:70) at Module._compile (module.js:441:26) at startup (node.js:80:27) at node.js:555:3 Does this script fetch recursively through all folders in a directory?
user0809 4 years ago
i tracked the bug launching a fresh installation of ubuntu. ubuntu ships an old version of nodejs, i update the solution to work with the old and the more recent version. This should work! :D
andijcr 4 years ago
View Timeline