Small Python script for file transfer
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I need a simple Python script which will:

  1. Listen for changes to a Wilddog/Firebase node:

  1. On data update, fetch a list of file links to download from this node:

  1. Download all files from links from step 2 and extract all XML files from the downloaded zip files

  2. Place the extracted XML files in a local folder and delete the zip files

  3. After completing steps 1 - 4, clear all data from this node:

  1. The script also needs to listen for new files arriving on a local folder and upload each new file via FTP to a folder on a remote server

  2. The local folder paths for steps 4, 6 and the FTP details for step 6 should be dynamic parameters that can be changed


Wilddog is pretty much a copy of Firebase and their codebase are the same, so what works for Firebase should work for Wilddog

They have a Python SDK if necessary:

awarded to feroldi

Crowdsource coding tasks.

2 Solutions


I'm done with the base requiremnts you may test here (follow the run instructions in the link):

Sorry i have deleted the current listings.

The only remaining part is deploying the script to run forever in this case i gonna need some information about your operating system to customize the script for it, for example if Linux we can use cron jobs to run the script every x minutes or run it as a process ...

The script require python 2 and wilddog-python which already installed in the colab link i shared, if you don't wanna deploy the script locally you can keep using this colab and run it frequently colab is an awesome tool.

Kindly don't hesitate to ask any questions.

Best regards,


Thanks a lot! I'll be running this on either Ubuntu or Mac OS, so this should be compatible with both right?
user0809 2 years ago
Btw, can you please make this into a single Python script file? Thanks!
user0809 2 years ago
you may download the script from here and don't forget installing the required library and changing the ftp information do you want the script to be a process that run forever or to run at a scheduled time? possible options:
ahmedengu 2 years ago
Is the script working as expected ? do you need any help setting up your environment or changing anything ? we can do a screenshare if you need any help
ahmedengu 2 years ago
Hey, sorry for the late reply, I was really busy with another task. There's two scripts in the GitHub link, one called and, which one should I use? Also can you give a small example of how to run the scripts? Thanks
user0809 2 years ago
No worries, i have added a detailed instructions and simplified things, you can check it here:
ahmedengu 2 years ago
Winning solution

My solution covers both of your needs.
It downloads the archives (and keeps them on temporary memory, because there's no need to keep them on disk), compares them with the local XML files extracted from previous runs, and uploads to the FTP server only if they differ or if they don't exist locally.
And it does all that really fast.

Put it in a cron job and you are done, simple as that!


My solution requires python 3.7 or later.
The following is a list of dependencies used in my solution.


Save this list in a file named requirements.txt and install them with pip:

pip install -r /path/to/requirements.txt

Script usage

You can call it passing the -h flag to get a help message on how to use the script.
Anyhow, here's an example of the parameters:

$ python3.7 \
    --wilddog-url \
    --local-dir-path /tmp/ceb \
    --remote-dir-path /tmp/ceb \
    --sftp-host \
    --sftp-user some-user \
    --semaphore-file-path /tmp/semaphore.json

The script

The following is a listing of the complete script.

import aiohttp
import argparse
import asyncio
import asyncssh
import io
import json
import logging
import os
import pathlib
import sys
import zipfile
import urllib.parse

log = logging.getLogger(__name__)
    format=("[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s"),

def should_synchronize_files(xml_data, local_xml_path):
        with, "rb") as f:
            local_xml_data =
            if xml_data != local_xml_data:
                return True
    except FileNotFoundError:
        return True
    return False

async def synchronize_xml_files_with_server(
    archive_data, session, sftp, local_dir_path, remote_dir_path
    tasks = []
    with zipfile.ZipFile(archive_data) as archive:
        for archive_member in archive.namelist():
            # Check whether the downloaded XML file (referred to as `current`)
            # is brand new, or otherwise differs from the one on the server
            # (referred to as `last`).
            cur_xml_data =
            xml_file_path = pathlib.Path(local_dir_path, archive_member)
            files_differ = should_synchronize_files(
                xml_data=cur_xml_data, local_xml_path=xml_file_path
            if files_differ:
      "extracting {archive_member}")
                # Update the XML file locally and upload it to the FTP server.
                with, "wb") as f:
                    sftp.put(xml_file_path.as_posix(), remotepath=remote_dir_path)
      "no action needed for {archive_member}")
    await asyncio.gather(*tasks)

async def fetch_archive_and_sync(archive_url, session, sftp, opts):"downloading {archive_url}")
    async with session.get(archive_url) as resp:
        archive_data = io.BytesIO(await
    await synchronize_xml_files_with_server(
        archive_data, session, sftp, opts.local_dir_path, opts.remote_dir_path

async def main(opts):
    async with aiohttp.ClientSession() as session:
        async with session.get(opts.semaphore_url) as resp:
            cur_id = await resp.json()
            with, "r") as f:
                last_id = json.load(f)
        except FileNotFoundError:
            last_id = None
        if cur_id != last_id:
            async with session.get(opts.xml_listings_url) as resp:
                zip_files_urls = await resp.json()
            os.makedirs(opts.local_dir_path, exist_ok=True)
            async with asyncssh.connect(
                opts.sftp_host, username=opts.sftp_user
            ) as ssh_conn:
                async with ssh_conn.start_sftp_client() as sftp:
                    if not sftp.isdir(opts.remote_dir_path):
                        await sftp.mkdir(opts.remote_dir_path)
                    tasks = [
                        fetch_archive_and_sync(archive_url, session, sftp, opts)
                        for _, archive_url in zip_files_urls.items()
                    await asyncio.gather(*tasks)

            # Synchronize server's semaphore value with Wilddog's one
            with, "w") as f:
      "synchronizing semaphore from `{last_id}` to `{cur_id}`")
                json.dump(cur_id, f)

            async with session.delete(opts.xml_listings_url) as resp:
                assert resp.status == 200

if __name__ == "__main__":
    args = argparse.ArgumentParser()
        help=("Wilddog space URL from which to fetch archives information"),
            "Local directory path to which XML files are stored. "
            "If PATH doesn't exist, it will be created automatically."
            "Remote directory path on the SFTP server to which files are uploaded. "
            "If PATH doesn't exist on the server, it will be created automatically."
        help=("Host of the SFTP server to which files are uploaded."),
        help=("User name of the SFTP server."),
            "Path to the local semaphore JSON file. This is needed in order to keep "
            "track of the Wilddog's semaphore value. If PATH doesn't exist, it will "
            "be created automatically."
            "This is a flag parameter which, when passed, makes the program "
            "report more information about its process."

    opts = args.parse_args()
    opts.semaphore_url = urllib.parse.urljoin(
        opts.wilddog_url, "listings_xml_aux/aux_xml_send_semaphore_customs.json"
    opts.xml_listings_url = urllib.parse.urljoin(
        opts.wilddog_url, "listings_xml_customs_id.json"
    log.setLevel(logging.DEBUG if opts.verbose else logging.INFO)

    loop = asyncio.get_event_loop()
    # Zero-sleep to allow underlying connections to close
Hey feroldi, step 3 is missing After completing steps 1 - 4, clear all data from this node:
ahmedengu 2 years ago
@ahmedengu, thanks, done.
feroldi 2 years ago
View Timeline