Code for MongoDB Stitch function or AWS Lambda to render an html page to text
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

We need a serverless javascript solution to screen scrape the text body from a non-REST web page. The URL is "https://www.cancer.gov/about-cancer/treatment/drugs/trastuzumab" or "https://www.cancer.gov/about-cancer/treatment/drugs/" where "arg" is passed to the function. In this example, you'll see several bold, large sections: the drug name, "Use in Cancer", "More about ". "Research Results and Related Resources", "Clinical Trials Accepting Patients", We'd like to get each of these blocks as a separate returned string. The resulting strings can contain li, br, etc for formatting the results. We're in a hurry so this will probably be awarded within the next day or two.

Is a serverless function deployed to now.sh valid?
alv-c 23 days ago
Could you please provide the JSON format you want to return the data?
alv-c 23 days ago
awarded to meo

Crowdsource coding tasks.

2 Solutions


I've created a MongoDB Stitch function and a service to do this. To show you in detail I think we should do a call or something since MongoDB Stitch requires some good configuration. So far what I did was:

  1. Created a Stitch HTTP Service called httpServ and configured a simple rule to it (see screenshots) Service name and type: https://ibb.co/ck7k22X Service configuration: https://ibb.co/fpZ6216
  2. Created a Serverless function called getDrugData with the following code: https://pastebin.com/hsgkxzWz. I know the code is a little mess, but I couldn't find a way to use a npm package in a Stitch function (I couldn't even use DOMParser).
  3. Created a client with the following code: https://pastebin.com/D1b107dX See there cancer-research-bountify-swkvs is the App name that was automatically generated by MongoDB Stitch

The response of the function follows this format:

{
    "brands": string,
    "clinicalTrials": string,
    "combination": string,
    "fdaApproved": string,
    "more": string,
    "use": string,
    "relatedResources": string
} 

If you give me enough time some fields can be better formatted, for examle brands could be an array of strings, etc.

Example response for alemtuzumab https://pastebin.com/bsvgNdFD

Example response for abvd https://pastebin.com/r506rbQv

Hope this is what you want.

Open to changes!

Edit: Updated the code, one field was missing from the response

This is nicely done. It's an elegant solution.
billsouthworth 22 days ago
Winning solution
Tipped
Could you please update this to leave the external links in the text intact and also put in the brand names and FDA approval section at the beginning? Thanks. Nice Solution.
billsouthworth 22 days ago
@billsouthworth: I have updated the source code, added the external links and FDA approval, please check: https://get-cancer-drugs.hackermeo.now.sh/api/info?drug=melphalan
meo 22 days ago
View Timeline