Python config file to do batch tasks in InstaLooter
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

Using instaLooter instalooter docs

I'd like a python file that does the following for all users listed in the config file (see batch mode docs):

  • Limits the scraping to past 30 days
  • Only download the 3 images with the most likes and comments combined

Not sure if possible, but if a .csv file could be used instead of a config file that's even better

Can you give an example of the csv file?
thelostt 1 month ago
Like this here
mattmg83 1 month ago
Only pictures or do you need videos too?
thelostt 1 month ago
pictures only
mattmg83 1 month ago
awarded to thelostt

Crowdsource coding tasks.

1 Solution


This solution was tested with Python 3.6.2.

Assuming the script is named scrap.py, run it like so:

scrap.py --config profiles.csv

This will download the top 3 pictures from every profile in the CSV file, putting them in their destination directory (also from CSV config file).

The following is the code. It requires requests. If you don't have it yet, install it with pip install requests.

#!/bin/python3

import csv
import io
import os
import requests
import functools
import logging

from instaLooter import InstaLooter
from pathlib import Path
from argparse import ArgumentParser
from datetime import datetime, timedelta

logging.basicConfig(format="{message}", style="{")
log = logging.getLogger(__name__)
log.setLevel(logging.DEBUG)

def load_insta_profiles(config_path: Path):
    """ Reads a CSV file configuration containing 'username,dir_destination', and
    returns a tuple of those. """
    with io.open(config_path, newline="") as config:
        profile_reader = csv.reader(config, delimiter=",")
        for profile, destination in profile_reader:
            yield (profile, Path(destination))

def filter_out_old_media(media_list, days, allow_videos=False):
    for media in media_list:
        if media["is_video"] and not allow_videos:
            continue
        past_n_days = datetime.now() - timedelta(days=days)
        media_date = datetime.fromtimestamp(media["date"])
        if media_date >= past_n_days:
            yield media

def pick_top_media(media_list, first_n=3):
    sorted_media = sorted(media_list, key=lambda m: (m["comments"]["count"], m["likes"]["count"]), reverse=True)
    for i, media in enumerate(sorted_media):
        if i < first_n:
            yield media

if __name__ == "__main__":
    arg = ArgumentParser()
    arg.add_argument("--config", help="CSV configuration file containing profiles to scrap.")
    arg.add_argument("--first-n", type=int, default=3, help="Download the first N pictures")
    arg.add_argument("--days", type=int, default=30, help="Consider only pictures from up to N days")
    opts = arg.parse_args()

    for profile, dest in load_insta_profiles(opts.config):
        looter = InstaLooter(profile=profile)
        old_media = filter_out_old_media(looter.medias(), days=opts.days)

        # For every selected picture, output "<profile> <media id> <comments> <likes>".
        for media in pick_top_media(old_media, opts.first_n):
            os.makedirs(dest, exist_ok=True)
            with io.open(dest / f"{media['id']}.jpg", "wb") as f:
                response = requests.get(media["display_src"])
                f.write(response.content)
                log.info(f"{profile} {media['id']} {media['comments']['count']} {media['likes']['count']}")

Example of use:

Assuming the CSV file you gave me as an example, running the command like so:

scrap.py --config profiles.csv

Will create two directories: /home/pictures/mm1983_ and /home/pictures/sonya210681. Meanwhile, the program issues every scrap that's been done, with the following format:

<profile> <media id> <comments> <likes>

Downloaded files have the <media id>.jpg format. For additional commands, see scrap.py --help. If you get any issues, let me know!

Also, if you need anything extra, let me know.
thelostt 1 month ago
Minor fix: the script now correctly selects pictures from the last 30 days (it was doing the otherwise).
thelostt 1 month ago
Thanks I'll give it a try tomorrow, have to install a couple things and figure out the shared folder on virtualbox, if I struggle too much maybe we could do a quick screenshare session for a hefty tip Thanks for the prompt work and well written response overall
mattmg83 1 month ago
Got around to try, works like a charm, excellent work
mattmg83 1 month ago
Thank you!
thelostt 1 month ago
If I wanted to use a few additional options from here should I add them to scrap.py or in the CLI like so scrap.py --config profiles.csv -d --new -T {username}.{datetime} for example?
mattmg83 1 month ago
You have to add that feature to the code.
thelostt 1 month ago
You can contact me via e-mail (see my Github) so we can work on that.
thelostt 1 month ago