I'd like a scraper for Seeking Alpha (

Seeking Alpha has profiles for each of its members (you can search for it in the top right corner of the website).

I'm looking for profiles that have the following words in it:
- hedge funds
- former
- ex
- portfolio manager

Here are two examples of posts that I'd like to see:

The delivery is a CSV that includes the following in columns:
Column 1: User Name
Column 2: URL of profile
Column 3: How many keywords does it match (1, 2, 3, or 4)?

I only see 50 "Premium" Authors. are there more, or is this the list to work with at this time. If so, I would expect that this list will grow and you will want to filter future results with past ones. Can you confirm these points? (only 50 authors and the past - future authors)
TGIFriday over 3 years ago
Hi - I am not looking for a scraper for those 50. I'm looking for a crawler with the above keywords using the search box. Unfortunately, I don't see a directory of all the profiles on seeking alpha but i see If you crawled all profiles from a through z and searched for those keywords, I'm looking for CSV output of that. Does that help?
sk11 over 3 years ago
yes, thank you.
TGIFriday over 3 years ago
Terms of service appears to prohibit this use of data, so I will pass on this project.. Thank you for the feedback AND the opportunity to assist.
TGIFriday over 3 years ago
awarded to johnmurch

1 Solution

Winning solution

I apologies in advanced as this is my first bountify and although it says not to provide links to github but figured it would make it easier to run. I wrote it as a node.js script.

In order to find as many authors as I could I pulled each of the Top 100 from each section from

In addition I added a column 4: that has the keywords that are in the profile.

Code and Results:

    //How to Run
    //1. Install Node.js and NPM
    //2. Install Packages
    //Required Packages
    //npm install request
    //npm install async
    //npm install underscore
    //npm cheerio
    //3. node scrapeSeekingAlpha.js
    //This should generate results.csv


    "Prudent Finances",,1,"ex"
       var fs = require('fs'),
        request = require('request'),
        async = require('async'),
        _ = require('underscore'),
        cheerio = require('cheerio');

    var outputCSV = [];

    // EACH CATGORY URL (Top 100)
    var urls = [
    var authors = [];

      request(url, function(error, response, body) {
          if (!error && response.statusCode == 200) {
            var $ = cheerio.load(body);
            $('.ld_top_list li a').each(function(i, elm) {          
            $('.ld_more_list li a').each(function(i, elm) {          
            console.log("FETCH: "+ url);
            console.log("Check Connection: Issue viewing: "+ url);
      var uniqAuthors = _.uniq(authors);

      //@todo - If you want to output each author to text file
      // var wstream = fs.createWriteStream('authors.txt',{'flags': 'a'});
      // async.forEach(uniqAuthors,function(a,acb){
      //   wstream.write(a+"\n");
      // },function(){
      //   wstream.end();
      //   console.log("DONE");
      //   process.exit(0);
      // });

          var url = '' + name;
          request(url, function(error, response, body) {
              if (!error && response.statusCode == 200) {
                console.log("Fetching: "+url);
                var $ = cheerio.load(body);
                var author = $("#author_full_name").text();
                var profile = $('.profile_item_mini_text').text().toLowerCase();
                // console.log(name+"|"+profile);
                var boolHedgeFund = contains(profile, 'hedge fund');
                var boolFormer = contains(profile, 'former');
                var boolEx = contains(profile, 'ex');
                var boolPortfolioManager = contains(profile, 'portfolio manager');

                var count=0;
                var keywords =[];
                if(boolHedgeFund){count++;keywords.push('hedge fund')}
                if(boolPortfolioManager){count++;keywords.push('portfolio manager')}

                var capture = {
                  // profile: profile,
          var wstream = fs.createWriteStream('results.csv',{'flags': 'w'});

    function contains(r,s){ 
      return r.indexOf(s) !== -1; 
Hi, thanks for your work! I am seeking a broader search result on all members (i.e. not from the top 100 or top writers). Is that possible?
sk11 over 3 years ago
Basically this script can work if I had a full list of authors and/or users. Also please note that there is a difference between authors and users e.g the URL -. vs - I used the top 100 authors of each section as I haven't found a way to find all of the authors or users linked from the site.
johnmurch over 3 years ago