Scrape Earnings Call Transcripts from seekingalpha.com - R code
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

https://bountify.co/scrape-earnings-conference-call-transcripts

Essentially the same problem as the one posted above. The posted solution does not work for me.

Deliverable: Code that crawls SeekingAlpha's conference call webpages (or other repo of conference calls), captures each unique conference call transcript, and deposits the contents along with metadata, each in their own .txt or .csv.

The code can run on any programming language of your choosing but MUST include wrapper for R (if written in python, can be implemented with the reticulate package)

Crowdsource coding tasks.

1 Solution


Hi!

I've programmed the scraper. You can find the files here.

You can install python script dependencies by running:

pip3 install requests
pip3 install beautifulsoup4

The scraper is in python and test.r file contains an example code in R that invokes the scraper. (reticulate R package)

Python file contains 2 functions scrape_article & scrape_articles to scrape an article given URL and scrape pages containing articles, respectively.

Scraper outputs articles content in scraped-articles folder which is relative to the location where python script is located.

Let me know if you need any more changes.

Thank you,
Vladimir