Solution Timeline

All versions (edits) of solutions to Mass download list of APKs by Package Names appear below in the order they were created. Comments that appear under revisions were those created when that particular revision was current.

To see the revision history of a single solution (with diffs), click on the solution number (ie. "#1") in the upper right corner of a solution revision below.

← Bounty Expand all edits

Work so far (posting here because the comments section is cluttered):

The App Annie API doesn't allow you to download category rankings unless you have an App Annie Intelligence subscription, which has a significant cost. OP wants a solution that can screen-scrape the App Annie website using a free account, which is possible, but is a fragile solution.

App Annie makes all their money by selling that kind of data, so I bet they put a significant amount of effort into preventing people from screen-scraping it, and if OP is doing these downloads frequently enough, I bet they'd notice. Also, I'd need to write a script that does a fake login to the App Annie website, and if any part of their site changes, or requires a captcha to login, that script could break.

Once you're already logged into the website, the data is easy enough to get with the URL https://www.appannie.com/ajax/top-chart/table/?market=google-play&country_code=US&category=<category>&date=<date>&rank_sorting_type=rank&page_size=500&order_by=sort_order&order_type=desc.

That gives data in the format:

[
  {
    "sort_metric": "changeInRank",
    "name": "Run Sausage Run!",
    "company_url": "/company/1000200000003181/",
    "headerquarters": "Israel",
    "id": 20600008190011,
    "url": "/apps/google-play/app/com.crazylabs.sausage.run/details/",
    "company_name": "TabTale",
    "country_code": "il",
    "app_icon_css": "gp",
    "iap": true,
    "change": 0,
    "icon": "https://static-s.aa-cdn.net/img/gp/20600008190011/5CKz2OBj0E2IQ_-Ms_r0u13rQ7KAzlgDVAVBWQdhTAn5jbh6ru349hkvjxD72x-CkAsy=w300_w80"
  }
]
[
  {
    "sort_metric": "changeInRank",
    "name": "Minecraft",
    "company_url": "/company/1000200000016666/",
    "headerquarters": "Sweden",
    "id": 20600000000768,
    "url": "/apps/google-play/app/com.mojang.minecraftpe/details/",
    "company_name": "Mojang",
    "country_code": "se",
    "app_icon_css": "gp",
    "iap": true,
    "change": 0,
    "icon": "https://static-s.aa-cdn.net/img/gp/20600000000768/VSwHQjcAttxsLE47RuS4PqpC4LT7lCoSjE7Hx5AW_yCxtDvcnsHHvm5CTuL5BPN-uRTP=w300_w80"
  }
]

If you're scraping this data infrequently, and don't mind it not being fully automated, I could make a browser extension that exports the category lists after you've already logged in... that would be pretty easy and wouldn't break if they change around their login flow.

Here is my solution, following the revised bounty:

import sys

from bs4 import BeautifulSoup
import requests


DOMAIN = "https://apkpure.com"
SEARCH_URL = DOMAIN + "/search?q=%s"


def download_file(url, package_name):
    file_name = "%s.apk" % (package_name.replace(".", "_"))
    local_path = "./downloaded/%s" % file_name

    r = requests.get(url, stream=True)
    print("Downloading %s... " % file_name, end="")

    total_size = total_size = int(r.headers.get('content-length', 0))
    size = 0
    print("% 6.2f%%" % 0.0, end="")
    with open(local_path, "wb") as f:
        for chunk in r.iter_content(chunk_size=65536):
            if chunk:
                size += len(chunk)
                f.write(chunk)

                print("\b" * 7, end="")
                print("% 6.2f%%" % (size / total_size * 100), end="")
                sys.stdout.flush()
    print("\b" * 7, end="")
    print("100.00%")

    return (local_path, size)


if __name__ == '__main__':
    output_csv = open("output.csv", "w")
    output_csv.write("App name,Package name,Size,Location\n")

    for line in open("apk_list.txt", "r").readlines():
        package_name = line.strip()

        # Search page
        url = SEARCH_URL % package_name
        r = requests.get(url)

        if r.status_code != 200:
            print("Could not get search page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        first_result = soup.find("dl", class_="search-dl")
        if first_result is None:
            print("Could not find %s" % package_name)
            continue

        search_title = first_result.find("p", class_="search-title")
        search_title_a = search_title.find("a")

        app_name = search_title.text.strip()
        app_url = search_title_a.attrs["href"]


        # App page
        url = DOMAIN + app_url
        r = requests.get(url)

        if r.status_code != 200:
            print("Could not get app page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_button = soup.find("a", class_=" da")

        if download_button is None:
            print("%s is a paid app. Could not download." % package_name)
            continue

        download_url = download_button.attrs["href"]


        # Download app page
        url = DOMAIN + download_url
        r = requests.get(url)

        if r.status_code != 200:
            print("Could not get app download page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_link = soup.find("a", id="download_link")
        download_apk_url = download_link.attrs["href"]

        path, size = download_file(download_apk_url, package_name)

        # Write line to output CSV
        escaped_app_name = app_name.replace(",", "_")
        output_csv.write(",".join([escaped_app_name, package_name, str(size), path]))
        output_csv.write("\n")

Tested on Windows with Python 3.6. It requires requests and BeautifulSoup. The file containing the list of package names is just a text file with one entry per line. I tested the script with the two example package names you gave.

Here is my solution, following the revised bounty:

import os
import os.path
import sys
import re

from bs4 import BeautifulSoup
import requests


DOMAIN = "https://apkpure.com"
SEARCH_URL = DOMAIN + "/search?q=%s"

DOWNLOAD_DIR = "./downloaded"
PACKAGE_NAME_FILE = "package_names.txt"


def download_file(url, package_name):
    r = requests.get(url, stream=True)

    content_disposition = r.headers.get("content-disposition")
    filename = re.search(r'attachment; filename="(.*)"', content_disposition).groups()
    if filename:
        filename = filename[0]
    else:
        filename = "%s.apk" % (package_name.replace(".", "_"))

    local_path = os.path.normpath(os.path.join(DOWNLOAD_DIR, filename))
    sys.stdout.write("Downloading %s... " % filename)

    total_size = total_size = int(r.headers.get('content-length', 0))
    size = 0
    sys.stdout.write("% 6.2f%%" % 0.0)
    with open(local_path, "wb") as f:
        for chunk in r.iter_content(chunk_size=65536):
            if chunk:
                size += len(chunk)
                f.write(chunk)

                sys.stdout.write("\b" * 7)
                sys.stdout.write("% 6.2f%%" % (size / total_size * 100))
                sys.stdout.flush()
    sys.stdout.write("\b" * 7)
    sys.stdout.write("100.00%\n")

    return (local_path, size)


if __name__ == '__main__':
    # Output CSV
    output_csv = open("output.csv", "w")
    output_csv.write("App name,Package name,Size,Location\n")


    # Create download directory
    if not os.path.exists(DOWNLOAD_DIR):
        os.mkdir(DOWNLOAD_DIR)
    elif not os.path.isdir(DOWNLOAD_DIR):
        print("%s is not a directory." % DOWNLOAD_DIR)
        sys.exit(-1)


    for line in open(PACKAGE_NAME_FILE, "r").readlines():
        package_name = line.strip()

        # Search page
        url = SEARCH_URL % package_name
        r = requests.get(url)

        if r.status_code != 200:
            print("Could not get search page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        first_result = soup.find("dl", class_="search-dl")
        if first_result is None:
            print("Could not find %s" % package_name)
            continue

        search_title = first_result.find("p", class_="search-title")
        search_title_a = search_title.find("a")

        app_name = search_title.text.strip()
        app_url = search_title_a.attrs["href"]


        # App page
        url = DOMAIN + app_url
        r = requests.get(url)

        if r.status_code != 200:
            print("Could not get app page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_button = soup.find("a", class_=" da")

        if download_button is None:
            print("%s is a paid app. Could not download." % package_name)
            continue

        download_url = download_button.attrs["href"]


        # Download app page
        url = DOMAIN + download_url
        r = requests.get(url)

        if r.status_code != 200:
            print("Could not get app download page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_link = soup.find("a", id="download_link")
        download_apk_url = download_link.attrs["href"]

        path, size = download_file(download_apk_url, package_name)


        # Write row to output CSV
        output_csv.write(",".join([
            '"%s"' % app_name,
            '"%s"' % package_name,
            "%d" % size,
            '"%s"' % path]))
        output_csv.write("\n")

The script requires requests and bs4 (BeautifulSoup). The file containing the list of package names (package_names.txt) is just a text file with one entry per line. I tested the script with the two example package names you gave.

I tested the script on Windows and Ubuntu with Python 3.6. It does run with Python 2.7, but requests is having trouble making https requests.

Edit: Added mkdir for the download directory. Added double quotes for the csv entries. Made the download function parse the filename from the header. Made the script run on Python 2.7 (although it doesn't work because of https issues with requests),

Winning solution

Multi threaded python27 program

main.py

import threading
import uuid
from Queue import Queue
from spider import Spider
from general import *
import time
import urllib2

PROJECT_NAME = 'downloaded_directory'

HOMEPAGE = 'https://apkpure.com'
APP_LIST = 'app_list.txt'
# get the first apk name from text file

NUMBER_OF_THREADS = 4
queue = Queue()
Spider(PROJECT_NAME, HOMEPAGE, APP_LIST)
MAX_REQ =50;
x=1
threads = []
def create_spider():
    for _ in range(NUMBER_OF_THREADS):
        t = threading.Thread(target=work)
        threads.append(t)
        t.daemon = True
        t.start()



def work():
    global x
    while True:
        if x >= MAX_REQ:
            x = 1
            time.sleep(5)
            print "sleeping 5 sec"
        apk = queue.get()
        Spider.crawl_page(threading.current_thread().name, apk)
        queue.task_done()
        x +=1


def create_jobs():
    for link in file_to_set(APP_LIST):
        queue.put(link)
    queue.join()
    crawl()


def crawl():
    queued_links = file_to_set(APP_LIST)
    if len(queued_links) > 0:
        print(str(len(queued_links)) + ' links in the queue')
        create_jobs()

create_spider()
crawl()

def download_apk():
    with open('crawled_list.txt') as f:
        for line in f:


download_apk()

**spider.py**

from bs4 import BeautifulSoup
import requests
from general import *

class Spider:

project_name = ''
queue_file = ''
crawled_file = ''
search_page = ''
queue = set()
crawled = set()

def __init__(self, project_name, search_page, app_list):
    Spider.project_name = project_name
    Spider.search_page = search_page

    Spider.queue_file = app_list
    Spider.crawled_file = 'crawled_list.txt'
    self.boot()
    #self.crawl_page('Pioneer spider', Spider.base_apk)


@staticmethod
def boot():
    create_project_dir(Spider.project_name)
    create_crawled_list(Spider.crawled_file)
    Spider.queue = file_to_set(Spider.queue_file)
    Spider.crawled = file_to_set(Spider.crawled_file)


@staticmethod
def crawl_page(thread_name, apk):
    if apk not in Spider.crawled:
        print(thread_name + ' now crawling ' + apk)
        print('Queue ' + str(len(Spider.queue)) + ' | Crawled  ' + str(len(Spider.crawled)))
        s = Spider.gather_download_link(Spider.search_page+'/search?q=' + apk)
        Spider.add_link_to_queue(s)
        Spider.queue.remove(apk)
        Spider.update_files()


@staticmethod
def gather_download_link(search_url):

    try:
        response = requests.get(search_url, stream=True)
        soup = BeautifulSoup(response.text, "html.parser")
        link_part = soup.findAll('a', attrs={'class': 'more-down'})[0]['href']
        response_1 = requests.get(Spider.search_page+link_part+'/download?from=details', stream=True)
        soup_1 = BeautifulSoup(response_1.text, "html.parser")
    except Exception as e:
        print(str(e))
        return set()
    return soup_1.findAll('a', attrs={'id': 'download_link'})[0]['href']


@staticmethod
def add_link_to_queue(link):
    if link not in Spider.crawled:
        Spider.crawled.add(link)

@staticmethod
def update_files():
    set_to_file(Spider.queue, Spider.queue_file)
    set_to_file(Spider.crawled, Spider.crawled_file)
 **general.py**

import os

def create_project_dir(directory):
if not os.path.exists(directory):
print('Wait Creating directory ' + directory)
os.makedirs(directory)

def create_crawled_list(crawled_list):
if not os.path.isfile(crawled_list):
write_file(crawled_list, '')

def write_file(path, data):
with open(path, 'w') as f:
f.write(data)

def append_to_file(path, data):
with open(path, 'a') as file:
file.write(data + '\n')

def delete_file_contents(path):
open(path, 'w').close()

def file_to_set(file_name):
results = set()
with open(file_name, 'rt') as f:
for line in f:
results.add(line.replace('\n', ''))
return results

def set_to_file(links, file_name):
with open(file_name,"w") as f:
for l in sorted(links):
f.write(l+"\n")

**NOTE** 
1. place all files in same folder
2. Create a text file named app_list.txt **where you have a list of app names**
3. This is a multithread application, which means it can find multiple download links at the same time suitable for large list
4. I have not made the download function, you can use the function provided by cytebode.
5. All the download links the program finds is copied in a separate text file named crawled_list.txt **the program automatically creates this file**

Multi threaded python27 program

main.py

import threading
import uuid
from Queue import Queue
from spider import Spider
from general import *
import time
import urllib2

PROJECT_NAME = 'downloaded_directory'

HOMEPAGE = 'https://apkpure.com'
APP_LIST = 'app_list.txt'
# get the first apk name from text file

NUMBER_OF_THREADS = 4
queue = Queue()
Spider(PROJECT_NAME, HOMEPAGE, APP_LIST)
MAX_REQ =50;
x=1
threads = []
def create_spider():
    for _ in range(NUMBER_OF_THREADS):
        t = threading.Thread(target=work)
        threads.append(t)
        t.daemon = True
        t.start()



def work():
    global x
    while True:
        if x >= MAX_REQ:
            x = 1
            time.sleep(5)
            print "sleeping 5 sec"
        apk = queue.get()
        Spider.crawl_page(threading.current_thread().name, apk)
        queue.task_done()
        x +=1


def create_jobs():
    for link in file_to_set(APP_LIST):
        queue.put(link)
    queue.join()
    crawl()


def crawl():
    queued_links = file_to_set(APP_LIST)
    if len(queued_links) > 0:
        print(str(len(queued_links)) + ' links in the queue')
        create_jobs()

create_spider()
crawl()

def download_apk():
    with open('crawled_list.txt') as f:
        for line in f:
           # each line here is the download link of the apk, you can use cytebode's download function to download the file

download_apk()

spider.py

from bs4 import BeautifulSoup
import requests
from general import *

class Spider:

    project_name = ''
    queue_file = ''
    crawled_file = ''
    search_page = ''
    queue = set()
    crawled = set()

    def __init__(self, project_name, search_page, app_list):
        Spider.project_name = project_name
        Spider.search_page = search_page

        Spider.queue_file = app_list
        Spider.crawled_file = 'crawled_list.txt'
        self.boot()
        #self.crawl_page('Pioneer spider', Spider.base_apk)


    @staticmethod
    def boot():
        create_project_dir(Spider.project_name)
        create_crawled_list(Spider.crawled_file)
        Spider.queue = file_to_set(Spider.queue_file)
        Spider.crawled = file_to_set(Spider.crawled_file)


    @staticmethod
    def crawl_page(thread_name, apk):
        if apk not in Spider.crawled:
            print(thread_name + ' now crawling ' + apk)
            print('Queue ' + str(len(Spider.queue)) + ' | Crawled  ' + str(len(Spider.crawled)))
            s = Spider.gather_download_link(Spider.search_page+'/search?q=' + apk)
            Spider.add_link_to_queue(s)
            Spider.queue.remove(apk)
            Spider.update_files()


    @staticmethod
    def gather_download_link(search_url):

        try:
            response = requests.get(search_url, stream=True)
            soup = BeautifulSoup(response.text, "html.parser")
            link_part = soup.findAll('a', attrs={'class': 'more-down'})[0]['href']
            response_1 = requests.get(Spider.search_page+link_part+'/download?from=details', stream=True)
            soup_1 = BeautifulSoup(response_1.text, "html.parser")
        except Exception as e:
            print(str(e))
            return set()
        return soup_1.findAll('a', attrs={'id': 'download_link'})[0]['href']


    @staticmethod
    def add_link_to_queue(link):
        if link not in Spider.crawled:
            Spider.crawled.add(link)

    @staticmethod
    def update_files():
        set_to_file(Spider.queue, Spider.queue_file)
        set_to_file(Spider.crawled, Spider.crawled_file)

general.py

import os

def create_project_dir(directory):
    if not os.path.exists(directory):
        print('Wait Creating directory ' + directory)
        os.makedirs(directory)



def create_crawled_list(crawled_list):
    if not os.path.isfile(crawled_list):
        write_file(crawled_list, '')


def write_file(path, data):
    with open(path, 'w') as f:
        f.write(data)


def append_to_file(path, data):
    with open(path, 'a') as file:
        file.write(data + '\n')



def delete_file_contents(path):
    open(path, 'w').close()


def file_to_set(file_name):
    results = set()
    with open(file_name, 'rt') as f:
        for line in f:
            results.add(line.replace('\n', ''))
    return results


def set_to_file(links, file_name):
    with open(file_name,"w") as f:
        for l in sorted(links):
            f.write(l+"\n")

NOTE
1. place all files in same folder
2. Create a text file named app_list.txt where you have a list of app names
3. This is a multithread application, which means it can find multiple download links at the same time suitable for large list
4. I have not made the download function, you can use the function provided by cytebode.
5. All the download links the program finds is copied in a separate text file named crawled_list.txt the program automatically creates this file

Multi threaded python27 program

main.py

import threading
import uuid
from Queue import Queue
from spider import Spider
from general import *
import time
import urllib2

PROJECT_NAME = 'downloaded_directory' # the directory in which you want to download the apk

HOMEPAGE = 'https://apkpure.com'  
APP_LIST = 'app_list.txt'

NUMBER_OF_THREADS = 4 # no of threads you want
queue = Queue()
Spider(PROJECT_NAME, HOMEPAGE, APP_LIST)
MAX_REQ =50;  # set a max no of request at a time
x=1
def create_spider():
    for _ in range(NUMBER_OF_THREADS):
        t = threading.Thread(target=work)
        threads.append(t)
        t.daemon = True
        t.start()



def work():
    global x
    while True:
        if x >= MAX_REQ:
            x = 1
            time.sleep(5)
            print "sleeping 5 sec"
        apk = queue.get()
        Spider.crawl_page(threading.current_thread().name, apk)
        queue.task_done()
        x +=1


def create_jobs():
    for link in file_to_set(APP_LIST):
        queue.put(link)
    queue.join()
    crawl()


def crawl():
    queued_links = file_to_set(APP_LIST)
    if len(queued_links) > 0:
        print(str(len(queued_links)) + ' links in the queue')
        create_jobs()

create_spider()
crawl()

# function to downlaod apk files
# this function will read the crawled_list.txt file generated by the program which contains the download lisnk of the apk files fetched by the program
def download_apk():
    with open('crawled_list.txt') as f:
        for line in f:
           # each line here is the download link of the apk, you can use cytebode's download function to download the file

download_apk()

spider.py

from bs4 import BeautifulSoup
import requests
from general import *

class Spider:

    project_name = ''
    queue_file = ''
    crawled_file = ''
    search_page = ''
    queue = set()
    crawled = set()

    def __init__(self, project_name, search_page, app_list):
        Spider.project_name = project_name
        Spider.search_page = search_page

        Spider.queue_file = app_list
        Spider.crawled_file = 'crawled_list.txt'
        self.boot()


    @staticmethod
    def boot():
        create_project_dir(Spider.project_name)
        create_crawled_list(Spider.crawled_file)
        Spider.queue = file_to_set(Spider.queue_file)
        Spider.crawled = file_to_set(Spider.crawled_file)


    @staticmethod
    def crawl_page(thread_name, apk):
        if apk not in Spider.crawled:
            print(thread_name + ' now crawling ' + apk)
            print('Queue ' + str(len(Spider.queue)) + ' | Crawled  ' + str(len(Spider.crawled)))
            s = Spider.gather_download_link(Spider.search_page+'/search?q=' + apk)
            Spider.add_link_to_queue(s)
            Spider.queue.remove(apk)
            Spider.update_files()


    @staticmethod
    def gather_download_link(search_url):

        try:
            response = requests.get(search_url, stream=True)
            soup = BeautifulSoup(response.text, "html.parser")
            link_part = soup.findAll('a', attrs={'class': 'more-down'})[0]['href']
            response_1 = requests.get(Spider.search_page+link_part+'/download?from=details', stream=True)
            soup_1 = BeautifulSoup(response_1.text, "html.parser")
        except Exception as e:
            print(str(e))
            return set()
        return soup_1.findAll('a', attrs={'id': 'download_link'})[0]['href']


    @staticmethod
    def add_link_to_queue(link):
        if link not in Spider.crawled:
            Spider.crawled.add(link)

    @staticmethod
    def update_files():
        set_to_file(Spider.queue, Spider.queue_file)
        set_to_file(Spider.crawled, Spider.crawled_file)

general.py

import os

def create_project_dir(directory):
    if not os.path.exists(directory):
        print('Wait Creating directory ' + directory)
        os.makedirs(directory)



def create_crawled_list(crawled_list):
    if not os.path.isfile(crawled_list):
        write_file(crawled_list, '')


def write_file(path, data):
    with open(path, 'w') as f:
        f.write(data)


def append_to_file(path, data):
    with open(path, 'a') as file:
        file.write(data + '\n')



def delete_file_contents(path):
    open(path, 'w').close()


def file_to_set(file_name):
    results = set()
    with open(file_name, 'rt') as f:
        for line in f:
            results.add(line.replace('\n', ''))
    return results


def set_to_file(links, file_name):
    with open(file_name,"w") as f:
        for l in sorted(links):
            f.write(l+"\n")

NOTE
1. place all files in same folder
2. Create a text file named app_list.txt where you have a list of app names
3. This is a multithread application, which means it can find multiple download links at the same time suitable for large list
4. I have not made the download function, you can use the function provided by cytebode.
5. All the download links the program finds is copied in a separate text file named crawled_list.txt the program automatically creates this file

Here is my solution using Node.js Runtime, with few dependencies.

I made a repository for my solution, it is called APK Scrape

Also, it is not yet published on NPM, and if you want, I can publish it at there.

Here is a the copy/paste of the readme file:

APK Scrape

Scrape and download APK using package identifier from APKPure

Usage

apk-scrape -p ./packages.txt -d ./download

Installation

Clone the repository:

git clone https://github.com/EmpireWorld/apk-scrape.git

Use the command line:

./apk-scrape -p ./packages.txt -d ./download

Help

apk-scrape --help

Multi threaded python27 program

main.py

import threading
import uuid
from Queue import Queue
from spider import Spider
from general import *
import time
import urllib2

PROJECT_NAME = 'downloaded_directory' # the directory in which you want to download the apk

HOMEPAGE = 'https://apkpure.com'  
APP_LIST = 'app_list.txt'

NUMBER_OF_THREADS = 4 # no of threads you want
queue = Queue()
Spider(PROJECT_NAME, HOMEPAGE, APP_LIST)
MAX_REQ =50;  # set a max no of request at a time
x=1
def create_spider():
    for _ in range(NUMBER_OF_THREADS):
        t = threading.Thread(target=work)
        #threads.append(t)
        t.daemon = True
        t.start()



def work():
    global x
    while True:
        if x >= MAX_REQ:
            x = 1
            time.sleep(5)
            print "sleeping 5 sec"
        apk = queue.get()
        Spider.crawl_page(threading.current_thread().name, apk)
        queue.task_done()
        x +=1


def create_jobs():
    for link in file_to_set(APP_LIST):
        queue.put(link)
    queue.join()
    crawl()


def crawl():
    queued_links = file_to_set(APP_LIST)
    if len(queued_links) > 0:
        print(str(len(queued_links)) + ' links in the queue')
        create_jobs()

create_spider()
crawl()

# function to downlaod apk files
# this function will read the crawled_list.txt file generated by the program which contains the download lisnk of the apk files fetched by the program
def download_apk():
    with open('crawled_list.txt') as f:
        for line in f:
           # each line here is the download link of the apk, you can use cytebode's download function to download the file

download_apk()

spider.py

from bs4 import BeautifulSoup
import requests
from general import *

class Spider:

    project_name = ''
    queue_file = ''
    crawled_file = ''
    search_page = ''
    queue = set()
    crawled = set()

    def __init__(self, project_name, search_page, app_list):
        Spider.project_name = project_name
        Spider.search_page = search_page

        Spider.queue_file = app_list
        Spider.crawled_file = 'crawled_list.txt'
        self.boot()


    @staticmethod
    def boot():
        create_project_dir(Spider.project_name)
        create_crawled_list(Spider.crawled_file)
        Spider.queue = file_to_set(Spider.queue_file)
        Spider.crawled = file_to_set(Spider.crawled_file)


    @staticmethod
    def crawl_page(thread_name, apk):
        if apk not in Spider.crawled:
            print(thread_name + ' now crawling ' + apk)
            print('Queue ' + str(len(Spider.queue)) + ' | Crawled  ' + str(len(Spider.crawled)))
            s = Spider.gather_download_link(Spider.search_page+'/search?q=' + apk)
            Spider.add_link_to_queue(s)
            Spider.queue.remove(apk)
            Spider.update_files()


    @staticmethod
    def gather_download_link(search_url):

        try:
            response = requests.get(search_url, stream=True)
            soup = BeautifulSoup(response.text, "html.parser")
            link_part = soup.findAll('a', attrs={'class': 'more-down'})[0]['href']
            response_1 = requests.get(Spider.search_page+link_part+'/download?from=details', stream=True)
            soup_1 = BeautifulSoup(response_1.text, "html.parser")
        except Exception as e:
            print(str(e))
            return set()
        return soup_1.findAll('a', attrs={'id': 'download_link'})[0]['href']


    @staticmethod
    def add_link_to_queue(link):
        if link not in Spider.crawled:
            Spider.crawled.add(link)

    @staticmethod
    def update_files():
        set_to_file(Spider.queue, Spider.queue_file)
        set_to_file(Spider.crawled, Spider.crawled_file)

general.py

import os

def create_project_dir(directory):
    if not os.path.exists(directory):
        print('Wait Creating directory ' + directory)
        os.makedirs(directory)



def create_crawled_list(crawled_list):
    if not os.path.isfile(crawled_list):
        write_file(crawled_list, '')


def write_file(path, data):
    with open(path, 'w') as f:
        f.write(data)


def append_to_file(path, data):
    with open(path, 'a') as file:
        file.write(data + '\n')



def delete_file_contents(path):
    open(path, 'w').close()


def file_to_set(file_name):
    results = set()
    with open(file_name, 'rt') as f:
        for line in f:
            results.add(line.replace('\n', ''))
    return results


def set_to_file(links, file_name):
    with open(file_name,"w") as f:
        for l in sorted(links):
            f.write(l+"\n")

NOTE
1. place all files in same folder
2. Create a text file named app_list.txt where you have a list of app names
3. This is a multithread application, which means it can find multiple download links at the same time suitable for large list
4. I have not made the download function, you can use the function provided by cytebode.
5. All the download links the program finds is copied in a separate text file named crawled_list.txt the program automatically creates this file

Multi threaded python27 program

main.py

import threading
import uuid
from Queue import Queue
from spider import Spider
from general import *
import time
import urllib2

PROJECT_NAME = 'downloaded_directory' # the directory in which you want to download the apk

HOMEPAGE = 'https://apkpure.com'  
APP_LIST = 'app_list.txt'

NUMBER_OF_THREADS = 4 # no of threads you want
queue = Queue()
Spider(PROJECT_NAME, HOMEPAGE, APP_LIST)
MAX_REQ =50;  # set a max no of request at a time
x=1
def create_spider():
    for _ in range(NUMBER_OF_THREADS):
        t = threading.Thread(target=work)
        #threads.append(t)
        t.daemon = True
        t.start()



def work():
    global x
    while True:
        if x >= MAX_REQ:
            x = 1
            time.sleep(5)
            print "sleeping 5 sec"
        apk = queue.get()
        Spider.crawl_page(threading.current_thread().name, apk)
        queue.task_done()
        x +=1


def create_jobs():
    for link in file_to_set(APP_LIST):
        queue.put(link)
    queue.join()
    crawl()


def crawl():
    queued_links = file_to_set(APP_LIST)
    if len(queued_links) > 0:
        print(str(len(queued_links)) + ' links in the queue')
        create_jobs()

create_spider()
crawl()

# function to downlaod apk files
# this function will read the crawled_list.txt file generated by the program which contains the download lisnk of the apk files fetched by the program
def download_apk():
    with open('crawled_list.txt') as f:
        for line in f:
           # each line here is the download link of the apk, you can use cytebode's download function to download the file

download_apk()

spider.py

from bs4 import BeautifulSoup
import requests
from general import *

class Spider:

    project_name = ''
    queue_file = ''
    crawled_file = ''
    search_page = ''
    queue = set()
    crawled = set()

    def __init__(self, project_name, search_page, app_list):
        Spider.project_name = project_name
        Spider.search_page = search_page

        Spider.queue_file = app_list
        Spider.crawled_file = 'crawled_list.txt'
        self.boot()


    @staticmethod
    def boot():
        create_project_dir(Spider.project_name)
        create_crawled_list(Spider.crawled_file)
        Spider.queue = file_to_set(Spider.queue_file)
        Spider.crawled = file_to_set(Spider.crawled_file)


    @staticmethod
    def crawl_page(thread_name, apk):
        if apk not in Spider.crawled:
            print(thread_name + ' now crawling ' + apk + '\n')
            print('Queue ' + str(len(Spider.queue)) + ' | Crawled  ' + str(len(Spider.crawled)))
            s = Spider.gather_download_link(Spider.search_page+'/search?q=' + apk)
            Spider.add_link_to_queue(s)
            Spider.queue.remove(apk)
            Spider.update_files()


    @staticmethod
    def gather_download_link(search_url):

        try:
            response = requests.get(search_url, stream=True)
            soup = BeautifulSoup(response.text, "html.parser")
            list = soup.findAll('a', attrs={'class': 'more-down'})
            if list:
                link_part = list[0]['href']
                response_1 = requests.get(Spider.search_page+link_part+'/download?from=details', stream=True)
                soup_1 = BeautifulSoup(response_1.text, "html.parser")
                inner_list = soup_1.findAll('a', attrs={'id': 'download_link'})
                if inner_list:
                    return inner_list[0]['href']
        except Exception as e:
            print(str(e))
            return set()


    @staticmethod
    def add_link_to_queue(link):
        if link not in Spider.crawled:
            Spider.crawled.add(link)

    @staticmethod
    def update_files():
        set_to_file(Spider.queue, Spider.queue_file)
        set_to_file(Spider.crawled, Spider.crawled_file)

general.py

import os

def create_project_dir(directory):
    if not os.path.exists(directory):
        print('Wait Creating directory ' + directory)
        os.makedirs(directory)



def create_crawled_list(crawled_list):
    if not os.path.isfile(crawled_list):
        write_file(crawled_list, '')


def write_file(path, data):
    with open(path, 'w') as f:
        f.write(data)


def append_to_file(path, data):
    with open(path, 'a') as file:
        file.write(data + '\n')



def delete_file_contents(path):
    open(path, 'w').close()


def file_to_set(file_name):
    results = set()
    with open(file_name, 'rt') as f:
        for line in f:
            results.add(line.replace('\n', ''))
    return results


def set_to_file(links, file_name):
    with open(file_name,"w") as f:
        for l in sorted(links):
            f.write(l+"\n")

NOTE
1. place all files in same folder
2. Create a text file named app_list.txt where you have a list of app names
3. This is a multithread application, which means it can find multiple download links at the same time suitable for large list
4. I have not made the download function, you can use the function provided by cytebode.
5. All the download links the program finds is copied in a separate text file named crawled_list.txt the program automatically creates this file

Here's an updated version of my solution, with concurrent downloads:

import enum
from multiprocessing import Process, Queue
import os
import os.path
import re
import time

try:
    # Python 3
    from queue import Empty as EmptyQueueException
except ImportError:
    # Python 2
    from Queue import Empty as EmptyQueueException

from bs4 import BeautifulSoup
import requests


DOMAIN = "https://apkpure.com"
SEARCH_URL = DOMAIN + "/search?q=%s"

DOWNLOAD_DIR = "./downloaded/"
PACKAGE_NAMES_FILE = "package_names.txt"
OUTPUT_CSV = "output.csv"


CONCURRENT_DOWNLOADS = 4
PROCESS_TIMEOUT = 10.0


class Message(enum.Enum):
    error = -1
    payload = 0
    start = 1
    end = 2


def download_process(qi, qo):
    while True:
        message = qi.get()

        if message[0] == Message.payload:
            package_name, app_name, download_url = message[1]
        elif message[0] == Message.end:
            break

        # Head request for filename and size
        r = requests.get(download_url, stream=True)

        if r.status_code != 200:
            qo.put((Message.error, "HTTP Error %d" % r.status_code))
            r.close()
            continue

        content_disposition = r.headers.get("content-disposition", "")
        content_length = int(r.headers.get('content-length', 0))

        filename = re.search(r'filename="(.*)"', content_disposition)
        if filename and filename.groups():
            filename = filename.groups()[0]
        else:
            filename = "%s.apk" % (package_name.replace(".", "_"))

        local_path = os.path.normpath(os.path.join(DOWNLOAD_DIR, filename))

        if os.path.exists(local_path):
            if not os.path.isfile(local_path):
                # Not a file
                qo.put((Message.error, "%s is a directory." % local_path))
                r.close()
                continue
            if os.path.getsize(local_path) == content_length:
                # File has likely already been downloaded
                qo.put((Message.end, (package_name, app_name, content_length, local_path)))
                r.close()
                continue

        qo.put((Message.start, package_name))

        size = 0
        with open(local_path, "wb+") as f:
            for chunk in r.iter_content(chunk_size=65536):
                if chunk:
                    size += len(chunk)
                    f.write(chunk)

        qo.put((Message.payload, (package_name, app_name, size, local_path)))


def search_process(qi, qo):
    while True:
        message = qi.get()

        if message[0] == Message.payload:
            package_name = message[1]
        elif message[0] == Message.end:
            break

        # Search page
        url = SEARCH_URL % package_name
        r = requests.get(url)

        if r.status_code != 200:
            qo.put((Message.error, "Could not get search page for %s." % package_name))
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        first_result = soup.find("dl", class_="search-dl")
        if first_result is None:
            qo.put((Message.error, "Could not find %s." % package_name))
            continue

        search_title = first_result.find("p", class_="search-title")
        search_title_a = search_title.find("a")

        app_name = search_title.text.strip()
        app_url = search_title_a.attrs["href"]


        # App page
        url = DOMAIN + app_url
        r = requests.get(url)

        if r.status_code != 200:
            qo.put((Message.error, "Could not get app page for %s." % package_name))
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_button = soup.find("a", class_=" da")

        if download_button is None:
            qo.put((Message.error, "%s is a paid app. Could not download." % package_name))
            continue

        download_url = download_button.attrs["href"]


        # Download app page
        url = DOMAIN + download_url
        r = requests.get(url)

        if r.status_code != 200:
            qo.put((Message.error, "Could not get app download page for %s." % package_name))
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_link = soup.find("a", id="download_link")
        download_apk_url = download_link.attrs["href"]

        qo.put((Message.payload, (package_name, app_name, download_apk_url)))


def main():
    # Create the download directory
    if not os.path.exists(DOWNLOAD_DIR):
        os.makedirs(DOWNLOAD_DIR)
    elif not os.path.isdir(DOWNLOAD_DIR):
        print("%s is not a directory." % DOWNLOAD_DIR)
        return


    # Read the package names
    if not os.path.isfile(PACKAGE_NAMES_FILE):
        print("Could not find %s." % PACKAGE_NAMES_FILE)
        return
    with open(PACKAGE_NAMES_FILE, "r") as f:
        package_names = [line.strip() for line in f.readlines()]


    # CSV file header
    with open(OUTPUT_CSV, "w+") as csv:
        csv.write("App name,Package name,Size,Location\n")


    # Message-passing queues
    search_qi = Queue()
    search_qo = Queue()

    download_qi = Queue()
    download_qo = Queue()


    # Search Process
    search_proc = Process(target=search_process, args=(search_qo, search_qi))
    search_proc.start()


    # Download Processes
    download_procs = []
    for _ in range(CONCURRENT_DOWNLOADS):
        download_proc = Process(target=download_process,
                                args=(download_qo, download_qi))
        download_procs.append(download_proc)
        download_proc.start()


    iter_package_names = iter(package_names)
    active_tasks = 0

    # Send some queries to the search process
    for _ in range(CONCURRENT_DOWNLOADS + 1):
        try:
            package_name = next(iter_package_names)
            search_qo.put((Message.payload, package_name))
            active_tasks += 1
        except StopIteration:
            break


    while True:
        if active_tasks == 0:
            print("Done!")
            break

        try:
            # Messages from the search process
            message = search_qi.get(block=False)

            if message[0] == Message.payload:
                # Donwload URL found => Start a download
                download_qo.put(message)
                print("  Found app for %s." % message[1][0])
            elif message[0] == Message.error:
                # Error with search query
                print("!!" + message[1])
                active_tasks -= 1

                # Search for another app
                try:
                    package_name = next(iter_package_names)
                    search_qo.put((Message.payload, package_name))
                    active_tasks += 1
                except StopIteration:
                    pass
        except EmptyQueueException:
            pass

        try:
            # Messages from the download processes
            message = download_qi.get(block=False)

            if message[0] == Message.payload or message[0] == Message.end:
                # Download done
                package_name, app_name, size, location = message[1]

                if message[0] == Message.payload:
                    print("  Finished downloading %s." % package_name)
                elif message[0] == Message.end:
                    print("  File already downloaded for %s." % package_name)

                # Add row to CSV file
                with open(OUTPUT_CSV, "a") as csv:
                    csv.write(",".join([
                        '"%s"' % app_name.replace('"', '""'),
                        '"%s"' % package_name.replace('"', '""'),
                        "%d" % size,
                        '"%s"' % location.replace('"', '""')]))
                    csv.write("\n")

                active_tasks -= 1

                # Search for another app
                try:
                    package_name = next(iter_package_names)
                    search_qo.put((Message.payload, package_name))
                    active_tasks += 1
                except StopIteration:
                    pass

            elif message[0] == Message.start:
                # Download started
                print("  Started downloading %s." % message[1])
            elif message[0] == Message.error:
                # Error during download
                print("!!" + message[1])
                active_tasks -= 1
        except EmptyQueueException:
            pass

        time.sleep(1.0)

    # End processes
    search_qo.put((Message.end, ))
    for _ in range(CONCURRENT_DOWNLOADS):
        download_qo.put((Message.end, ))

    search_proc.join()
    for download_proc in download_procs:
        download_proc.join()


if __name__ == '__main__':
    main()

One feature I added was for the downloading to be skipped in case the file already exists locally and it has the same size. That way, every file doesn't get re-downloaded every time.

I'm using processes instead of threads to avoid having Python's Global Interpreter Lock serializing the execution. The main process creates 5 processes, 1 for searching the download URLs and 4 for downloading the files concurrently.

The search process only queries the website at the same rate at which the download processes are going through the downloads. That way, the website doesn't get bombarded by a ton of queries in a short amount of time. It doesn't need to be any faster anyway.

The download processes concurrently each download a file on their own. One thing to consider is that it will likely increase fragmentation on the file system, and if using an hard disk drive, it will increase seek time.

The main process orchestrates everything and prints logging events. Due to the asynchronous nature of the downloads, events may appear out of order, and progress isn't printed. It can make it look like the script has hanged (e.g. "Started downloading B" followed by "Finished downloading A").

I only did some limited testing, but with a list of 10 entries, I got a speedup of about 10% compared to my earlier version. This isn't much, but it's expected. All the concurrent downloads allow is for a more efficient use of the available connection speed, notably when a download is slower than the others. The only way it could be 4 times as fast would be if the files were stored on different servers and they were each capped at less than a fourth of the connection speed.

The code is more complex, and quite a bit fragile. Given it speeds things up only a little bit, I don't really recommend using it over my earlier version.

Tested on Windows and Debian, using Python 3.6 and 2.7.

Here's an updated version of my solution, with concurrent downloads:

import enum
from multiprocessing import Process, Queue
import os
import os.path
import re
import time

try:
    # Python 3
    from queue import Empty as EmptyQueueException
except ImportError:
    # Python 2
    from Queue import Empty as EmptyQueueException

from bs4 import BeautifulSoup
import requests


DOMAIN = "https://apkpure.com"
SEARCH_URL = DOMAIN + "/search?q=%s"

DOWNLOAD_DIR = "./downloaded/"
PACKAGE_NAMES_FILE = "package_names.txt"
OUTPUT_CSV = "output.csv"


CONCURRENT_DOWNLOADS = 4
PROCESS_TIMEOUT = 10.0


class Message(enum.Enum):
    error = -1
    payload = 0
    start = 1
    end = 2


def download_process(qi, qo):
    while True:
        message = qi.get()

        if message[0] == Message.payload:
            package_name, app_name, download_url = message[1]
        elif message[0] == Message.end:
            break

        r = requests.get(download_url, stream=True)

        if r.status_code != 200:
            qo.put((Message.error, "HTTP Error %d" % r.status_code))
            r.close()
            continue

        content_disposition = r.headers.get("content-disposition", "")
        content_length = int(r.headers.get('content-length', 0))

        filename = re.search(r'filename="(.*)"', content_disposition)
        if filename and filename.groups():
            filename = filename.groups()[0]
        else:
            filename = "%s.apk" % (package_name.replace(".", "_"))

        local_path = os.path.normpath(os.path.join(DOWNLOAD_DIR, filename))

        if os.path.exists(local_path):
            if not os.path.isfile(local_path):
                # Not a file
                qo.put((Message.error, "%s is a directory." % local_path))
                r.close()
                continue
            if os.path.getsize(local_path) == content_length:
                # File has likely already been downloaded
                qo.put((Message.end, (package_name, app_name, content_length, local_path)))
                r.close()
                continue

        qo.put((Message.start, package_name))

        size = 0
        with open(local_path, "wb+") as f:
            for chunk in r.iter_content(chunk_size=65536):
                if chunk:
                    size += len(chunk)
                    f.write(chunk)

        qo.put((Message.payload, (package_name, app_name, size, local_path)))


def search_process(qi, qo):
    while True:
        message = qi.get()

        if message[0] == Message.payload:
            package_name = message[1]
        elif message[0] == Message.end:
            break

        # Search page
        url = SEARCH_URL % package_name
        r = requests.get(url)

        if r.status_code != 200:
            qo.put((Message.error, "Could not get search page for %s." % package_name))
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        first_result = soup.find("dl", class_="search-dl")
        if first_result is None:
            qo.put((Message.error, "Could not find %s." % package_name))
            continue

        search_title = first_result.find("p", class_="search-title")
        search_title_a = search_title.find("a")

        app_name = search_title.text.strip()
        app_url = search_title_a.attrs["href"]


        # App page
        url = DOMAIN + app_url
        r = requests.get(url)

        if r.status_code != 200:
            qo.put((Message.error, "Could not get app page for %s." % package_name))
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_button = soup.find("a", class_=" da")

        if download_button is None:
            qo.put((Message.error, "%s is a paid app. Could not download." % package_name))
            continue

        download_url = download_button.attrs["href"]


        # Download app page
        url = DOMAIN + download_url
        r = requests.get(url)

        if r.status_code != 200:
            qo.put((Message.error, "Could not get app download page for %s." % package_name))
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_link = soup.find("a", id="download_link")
        download_apk_url = download_link.attrs["href"]

        qo.put((Message.payload, (package_name, app_name, download_apk_url)))


def main():
    # Create the download directory
    if not os.path.exists(DOWNLOAD_DIR):
        os.makedirs(DOWNLOAD_DIR)
    elif not os.path.isdir(DOWNLOAD_DIR):
        print("%s is not a directory." % DOWNLOAD_DIR)
        return


    # Read the package names
    if not os.path.isfile(PACKAGE_NAMES_FILE):
        print("Could not find %s." % PACKAGE_NAMES_FILE)
        return
    with open(PACKAGE_NAMES_FILE, "r") as f:
        package_names = [line.strip() for line in f.readlines()]


    # CSV file header
    with open(OUTPUT_CSV, "w+") as csv:
        csv.write("App name,Package name,Size,Location\n")


    # Message-passing queues
    search_qi = Queue()
    search_qo = Queue()

    download_qi = Queue()
    download_qo = Queue()


    # Search Process
    search_proc = Process(target=search_process, args=(search_qo, search_qi))
    search_proc.start()


    # Download Processes
    download_procs = []
    for _ in range(CONCURRENT_DOWNLOADS):
        download_proc = Process(target=download_process,
                                args=(download_qo, download_qi))
        download_procs.append(download_proc)
        download_proc.start()


    iter_package_names = iter(package_names)
    active_tasks = 0

    # Send some queries to the search process
    for _ in range(CONCURRENT_DOWNLOADS + 1):
        try:
            package_name = next(iter_package_names)
            search_qo.put((Message.payload, package_name))
            active_tasks += 1
        except StopIteration:
            break


    while True:
        if active_tasks == 0:
            print("Done!")
            break

        try:
            # Messages from the search process
            message = search_qi.get(block=False)

            if message[0] == Message.payload:
                # Donwload URL found => Start a download
                download_qo.put(message)
                print("  Found app for %s." % message[1][0])
            elif message[0] == Message.error:
                # Error with search query
                print("!!" + message[1])
                active_tasks -= 1

                # Search for another app
                try:
                    package_name = next(iter_package_names)
                    search_qo.put((Message.payload, package_name))
                    active_tasks += 1
                except StopIteration:
                    pass
        except EmptyQueueException:
            pass

        try:
            # Messages from the download processes
            message = download_qi.get(block=False)

            if message[0] == Message.payload or message[0] == Message.end:
                # Download done
                package_name, app_name, size, location = message[1]

                if message[0] == Message.payload:
                    print("  Finished downloading %s." % package_name)
                elif message[0] == Message.end:
                    print("  File already downloaded for %s." % package_name)

                # Add row to CSV file
                with open(OUTPUT_CSV, "a") as csv:
                    csv.write(",".join([
                        '"%s"' % app_name.replace('"', '""'),
                        '"%s"' % package_name.replace('"', '""'),
                        "%d" % size,
                        '"%s"' % location.replace('"', '""')]))
                    csv.write("\n")

                active_tasks -= 1

                # Search for another app
                try:
                    package_name = next(iter_package_names)
                    search_qo.put((Message.payload, package_name))
                    active_tasks += 1
                except StopIteration:
                    pass

            elif message[0] == Message.start:
                # Download started
                print("  Started downloading %s." % message[1])
            elif message[0] == Message.error:
                # Error during download
                print("!!" + message[1])
                active_tasks -= 1
        except EmptyQueueException:
            pass

        time.sleep(1.0)

    # End processes
    search_qo.put((Message.end, ))
    for _ in range(CONCURRENT_DOWNLOADS):
        download_qo.put((Message.end, ))

    search_proc.join()
    for download_proc in download_procs:
        download_proc.join()


if __name__ == '__main__':
    main()

One feature I added was for the downloading to be skipped in case the file already exists locally and it has the same size. That way, every file doesn't get re-downloaded every time.

I'm using processes instead of threads to avoid having Python's Global Interpreter Lock serializing the execution. The main process creates 5 processes, 1 for searching the download URLs and 4 for downloading the files concurrently.

The search process only queries the website at the same rate at which the download processes are going through the downloads. That way, the website doesn't get bombarded by a ton of queries in a short amount of time. It doesn't need to be any faster anyway.

The download processes concurrently each download a file on their own. One thing to consider is that it will likely increase fragmentation on the file system, and if using an hard disk drive, it will increase seek time.

The main process orchestrates everything and prints logging events. Due to the asynchronous nature of the downloads, events may appear out of order, and progress isn't printed. It can make it look like the script has hanged (e.g. "Started downloading B" followed by "Finished downloading A").

I only did some limited testing, but with a list of 10 entries, I got a speedup of about 10% compared to my earlier version. This isn't much, but it's expected. All the concurrent downloads allow is for a more efficient use of the available connection speed, notably when a download is slower than the others. The only way it could be 4 times as fast would be if the files were stored on different servers and they were each capped at less than a fourth of the connection speed.

The code is more complex, and quite a bit fragile. Given it speeds things up only a little bit, I don't really recommend using it over my earlier version.

Tested on Windows and Debian, using Python 3.6 and 2.7.

Here is my solution using Node.js Runtime, with few dependencies.

I made a repository for my solution, it is called APK Scrape

Also, it is not yet published on NPM, and if you want, I can publish it at there, and this solution is async, and works in parallel as the Node.js is async.

Here is a the copy/paste of the readme file:

APK Scrape

Scrape and download APK using package identifier from APKPure

Usage

apk-scrape -p ./packages.txt -d ./download

Installation

Clone the repository:

git clone https://github.com/EmpireWorld/apk-scrape.git

Use the command line:

./apk-scrape -p ./packages.txt -d ./download

Help

apk-scrape --help

Here is my solution, following the revised bounty:

import os
import os.path
import sys
import re
import time

from bs4 import BeautifulSoup
import requests


DOMAIN = "https://apkpure.com"
SEARCH_URL = DOMAIN + "/search?q=%s"

DOWNLOAD_DIR = "./downloaded"
PACKAGE_NAMES_FILE = "package_names.txt"
OUTPUT_CSV = "output.csv"

PROGRESS_UPDATE_DELAY = 0.25


def download_file(url, package_name):
    r = requests.get(url, stream=True)

    content_disposition = r.headers.get("content-disposition")
    filename = re.search(r'attachment; filename="(.*)"', content_disposition).groups()
    if filename:
        filename = filename[0]
    else:
        filename = "%s.apk" % (package_name.replace(".", "_"))

    local_path = os.path.normpath(os.path.join(DOWNLOAD_DIR, filename))
    sys.stdout.write("Downloading %s... " % filename)

    total_size = int(r.headers.get('content-length', 0))
    size = 0
    sys.stdout.write("% 6.2f%%" % 0.0)
    t = time.time()
    with open(local_path, "wb") as f:
        for chunk in r.iter_content(chunk_size=65536):
            if chunk:
                size += len(chunk)
                f.write(chunk)

                nt = time.time()
                if nt - t >= PROGRESS_UPDATE_DELAY:
                    sys.stdout.write("\b" * 7)
                    sys.stdout.write("% 6.2f%%" % (100.0 * size / total_size))
                    sys.stdout.flush()
                    t = nt
    sys.stdout.write("\b" * 7)
    sys.stdout.write("100.00%\n")

    return (local_path, size)


if __name__ == '__main__':
    # Output CSV
    output_csv = open(OUTPUT_CSV, "w")
    output_csv.write("App name,Package name,Size,Location\n")


    # Create download directory
    if not os.path.exists(DOWNLOAD_DIR):
        os.makedirs(DOWNLOAD_DIR)
    elif not os.path.isdir(DOWNLOAD_DIR):
        print("%s is not a directory." % DOWNLOAD_DIR)
        sys.exit(-1)


    for line in open(PACKAGE_NAMES_FILE, "r").readlines():
        package_name = line.strip()

        # Search page
        url = SEARCH_URL % package_name
        r = requests.get(url)

        if r.status_code != 200:
            print("Could not get search page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        first_result = soup.find("dl", class_="search-dl")
        if first_result is None:
            print("Could not find %s" % package_name)
            continue

        search_title = first_result.find("p", class_="search-title")
        search_title_a = search_title.find("a")

        app_name = search_title.text.strip()
        app_url = search_title_a.attrs["href"]


        # App page
        url = DOMAIN + app_url
        r = requests.get(url)

        if r.status_code != 200:
            print("Could not get app page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_button = soup.find("a", class_=" da")

        if download_button is None:
            print("%s is a paid app. Could not download." % package_name)
            continue

        download_url = download_button.attrs["href"]


        # Download app page
        url = DOMAIN + download_url
        r = requests.get(url)

        if r.status_code != 200:
            print("Could not get app download page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_link = soup.find("a", id="download_link")
        download_apk_url = download_link.attrs["href"]

        path, size = download_file(download_apk_url, package_name)


        # Write row to output CSV
        output_csv.write(",".join([
            '"%s"' % app_name.replace('"', '""'),
            '"%s"' % package_name.replace('"', '""'),
            "%d" % size,
            '"%s"' % path.replace('"', '""')]))
        output_csv.write("\n")

The script requires requests and bs4 (BeautifulSoup). The file containing the list of package names (package_names.txt) is just a text file with one entry per line. I tested the script with the two example package names you gave.

I tested the script on Windows and Ubuntu with Python 3.6 and 2.7.

Edit 1: Added mkdir for the download directory. Added double quotes for the csv entries. Made the download function parse the filename from the header. Made the script run on Python 2.7.

Edit 2: Made the progress update only every 0.25 seconds. Fixed the integer division bug with the progress update on Python 2.7. Changed mkdir to makedirs. Added .replace('"', '""') to the CSV entries to escape double quotes. Minor cleanup and refactoring.

Here's an updated version of my solution, with concurrent downloads:

import math
from multiprocessing import Process, Queue
import os
import os.path
import re
import sys
import time

try:
    # Python 3
    from queue import Empty as EmptyQueueException
    from queue import Full as FullQueueException
except ImportError:
    # Python 2
    from Queue import Empty as EmptyQueueException
    from Queue import Full as FullQueueException

from bs4 import BeautifulSoup
import requests


DOMAIN = "https://apkpure.com"
SEARCH_URL = DOMAIN + "/search?q=%s"

DOWNLOAD_DIR       = "./downloaded/"
PACKAGE_NAMES_FILE = "package_names.txt"
OUTPUT_CSV         = "output.csv"


CONCURRENT_DOWNLOADS  = 4
CHUNK_SIZE            = 128*1024 # 128 KiB
PROGRESS_UPDATE_DELAY = 0.25
PROCESS_TIMEOUT       = 10.0


MSG_ERROR    = -1
MSG_PAYLOAD  =  0
MSG_START    =  1
MSG_PROGRESS =  2
MSG_END      =  3


class SplitProgBar(object):
    @staticmethod
    def center(text, base):
        if len(text) <= len(base):
            left = (len(base) - len(text)) // 2
            return "%s%s%s" % (base[:left], text, base[left+len(text):])
        else:
            return base

    def __init__(self, n, width):
        self.n = n
        self.sub_width = int(float(width-(n+1))/n)
        self.width = n * (self.sub_width + 1) + 1
        self.progress = [float("NaN")] * n

    def __getitem__(self, ix):
        return self.progress[ix]

    def __setitem__(self, ix, value):
        self.progress[ix] = value

    def render(self):
        bars = []
        for prog in self.progress:
            if math.isnan(prog) or prog < 0.0:
                bars.append(" " * self.sub_width)
                continue
            bar = "=" * int(round(prog*self.sub_width))
            bar += " " * (self.sub_width-len(bar))
            bar = SplitProgBar.center(" %.2f%% " % (prog*100), bar)
            bars.append(bar)

        new_str = "|%s|" % "|".join(bars)
        sys.stdout.write("\r%s" % new_str)

    def clear(self):
        sys.stdout.write("\r%s\r" % (" " * self.width))


class Counter(object):
    def __init__(self, value = 0):
        self.value = value

    def inc(self, n = 1):
        self.value += n

    def dec(self, n = 1):
        self.value -= n

    @property
    def empty(self):
        return self.value == 0


def download_process(id_, qi, qo):
    def send_progress(progress):
        try:
            qo.put_nowait((MSG_PROGRESS, (id_, progress)))
        except FullQueueException:
            pass

    def send_error(msg):
        qo.put((MSG_ERROR, (id_, msg)))

    def send_start(pkg_name):
        qo.put((MSG_START, (id_, pkg_name)))

    def send_finished(pkg_name, app_name, size, path, already=False):
        if already:
            qo.put((MSG_END, (id_, pkg_name, app_name, size, path)))
        else:
            qo.put((MSG_PAYLOAD, (id_, pkg_name, app_name, size, path)))

    while True:
        message = qi.get()

        if message[0] == MSG_PAYLOAD:
            package_name, app_name, download_url = message[1]
        elif message[0] == MSG_END:
            break

        try:
            r = requests.get(download_url, stream=True)
        except requests.exceptions.ConnectionError:
            send_error("Connection error")
            continue

        if r.status_code != 200:
            send_error("HTTP Error %d" % r.status_code)
            r.close()
            continue

        content_disposition = r.headers.get("content-disposition", "")
        content_length = int(r.headers.get('content-length', 0))

        filename = re.search(r'filename="(.+)"', content_disposition)
        if filename and filename.groups():
            filename = filename.groups()[0]
        else:
            filename = "%s.apk" % (package_name.replace(".", "_"))

        local_path = os.path.normpath(os.path.join(DOWNLOAD_DIR, filename))

        if os.path.exists(local_path):
            if not os.path.isfile(local_path):
                # Not a file
                send_error("%s is a directory." % local_path)
                r.close()
                continue
            if os.path.getsize(local_path) == content_length:
                # File has likely already been downloaded
                send_finished(
                    package_name, app_name, content_length, local_path, True)
                r.close()
                continue

        send_start(package_name)

        size = 0
        t = time.time()
        with open(local_path, "wb+") as f:
            for chunk in r.iter_content(chunk_size=CHUNK_SIZE):
                if chunk:
                    size += len(chunk)
                    f.write(chunk)

                    nt = time.time()
                    if nt - t >= PROGRESS_UPDATE_DELAY:
                        send_progress(float(size) / content_length)
                        t = nt

        send_finished(package_name, app_name, size, local_path)


def search_process(qi, qo):
    def send_error(msg):
        qo.put((MSG_ERROR, msg))

    def send_payload(pkg_name, app_name, dl_url):
        qo.put((MSG_PAYLOAD, (pkg_name, app_name, dl_url)))

    while True:
        message = qi.get()

        if message[0] == MSG_PAYLOAD:
            package_name = message[1]
        elif message[0] == MSG_END:
            break

        # Search page
        url = SEARCH_URL % package_name
        try:
            r = requests.get(url)
        except requests.exceptions.ConnectionError:
            send_error("Connection error.")
            continue

        if r.status_code != 200:
            send_error("Could not get search page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        first_result = soup.find("dl", class_="search-dl")
        if first_result is None:
            send_error("Could not find %s." % package_name)
            continue

        search_title = first_result.find("p", class_="search-title")
        search_title_a = search_title.find("a")

        app_name = search_title.text.strip()
        app_url = search_title_a.attrs["href"]


        # App page
        url = DOMAIN + app_url
        try:
            r = requests.get(url)
        except requests.exceptions.ConnectionError:
            send_error("Connection error.")
            continue

        if r.status_code != 200:
            send_error("Could not get app page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_button = soup.find("a", class_=" da")

        if download_button is None:
            send_error("%s is a paid app. Could not download." % package_name)
            continue

        download_url = download_button.attrs["href"]


        # Download app page
        url = DOMAIN + download_url
        try:
            r = requests.get(url)
        except requests.exceptions.ConnectionError:
            send_error("Connection error.")
            continue

        if r.status_code != 200:
            send_error("Could not get app download page for %s." % package_name)
            continue

        soup = BeautifulSoup(r.text, "html.parser")

        download_link = soup.find("a", id="download_link")
        download_apk_url = download_link.attrs["href"]

        send_payload(package_name, app_name, download_apk_url)


def main():
    # Create the download directory
    if not os.path.exists(DOWNLOAD_DIR):
        os.makedirs(DOWNLOAD_DIR)
    elif not os.path.isdir(DOWNLOAD_DIR):
        print("%s is not a directory." % DOWNLOAD_DIR)
        return -1


    # Read the package names
    if not os.path.isfile(PACKAGE_NAMES_FILE):
        print("Could not find %s." % PACKAGE_NAMES_FILE)
        return -1

    with open(PACKAGE_NAMES_FILE, "r") as f:
        package_names = [line.strip() for line in f.readlines()]


    # CSV file header
    with open(OUTPUT_CSV, "w+") as csv:
        csv.write("App name,Package name,Size,Location\n")


    # Message-passing queues
    search_qi = Queue()
    search_qo = Queue()

    download_qi = Queue()
    download_qo = Queue()


    # Search Process
    search_proc = Process(target=search_process, args=(search_qo, search_qi))
    search_proc.start()


    # Download Processes
    download_procs = []
    for i in range(CONCURRENT_DOWNLOADS):
        download_proc = Process(target=download_process,
                                args=(i, download_qo, download_qi))
        download_procs.append(download_proc)
        download_proc.start()


    active_tasks = Counter()
    def new_search_query():
        if package_names:
            search_qo.put((MSG_PAYLOAD, package_names.pop(0)))
            active_tasks.inc()
            return True
        return False

    # Send some queries to the search process
    for _ in range(CONCURRENT_DOWNLOADS + 1):
        new_search_query()


    prog_bars = SplitProgBar(CONCURRENT_DOWNLOADS, 80)

    def log(msg, pb=True):
        prog_bars.clear()
        print(msg)
        if pb:
            prog_bars.render()
        sys.stdout.flush()

    last_message_time = time.time()
    while True:
        if active_tasks.empty:
            log("Done!", False)
            break

        no_message = True

        try:
            # Messages from the search process
            message = search_qi.get(block=False)
            last_message_time = time.time()
            no_message = False

            if message[0] == MSG_PAYLOAD:
                # Donwload URL found => Start a download
                download_qo.put(message)
                log("  Found app for %s." % message[1][0])

            elif message[0] == MSG_ERROR:
                # Error with search query
                log("!!" + message[1])
                active_tasks.dec()

                # Search for another app
                new_search_query()
        except EmptyQueueException:
            pass

        try:
            # Messages from the download processes
            message = download_qi.get(block=False)
            last_message_time = time.time()
            no_message = False

            if message[0] == MSG_PAYLOAD or message[0] == MSG_END:
                # Download finished
                id_, package_name, app_name, size, location = message[1]
                prog_bars[id_] = float("NaN")

                if message[0] == MSG_PAYLOAD:
                    log("  Finished downloading %s." % package_name)
                elif message[0] == MSG_END:
                    log("  File already downloaded for %s." % package_name)

                # Add row to CSV file
                with open(OUTPUT_CSV, "a") as csv:
                    csv.write(",".join([
                        '"%s"' % app_name.replace('"', '""'),
                        '"%s"' % package_name.replace('"', '""'),
                        "%d" % size,
                        '"%s"' % location.replace('"', '""')]))
                    csv.write("\n")

                active_tasks.dec()

                # Search for another app
                new_search_query()

            elif message[0] == MSG_START:
                # Download started
                id_, package_name = message[1]
                prog_bars[id_] = 0.0
                log("  Started downloading %s." % package_name)

            elif message[0] == MSG_PROGRESS:
                # Download progress
                id_, progress = message[1]
                prog_bars[id_] = progress
                prog_bars.render()

            elif message[0] == MSG_ERROR:
                # Error during download
                id_, msg = message[1]
                log("!!" + msg)
                prog_bars[id_] = 0.0

                active_tasks.dec()

                # Search for another app
                new_search_query()
        except EmptyQueueException:
            pass

        if no_message:
            if time.time() - last_message_time > PROCESS_TIMEOUT:
                log("!!Timed out after %.2f seconds." % (PROCESS_TIMEOUT), False)
                break
            time.sleep(PROGRESS_UPDATE_DELAY / 2.0)

    # End processes
    search_qo.put((MSG_END, ))
    for _ in range(CONCURRENT_DOWNLOADS):
        download_qo.put((MSG_END, ))

    search_proc.join()
    for download_proc in download_procs:
        download_proc.join()

    return 0


if __name__ == '__main__':
    sys.exit(main())

One feature I added was for the downloading to be skipped in case the file already exists locally and it has the same size. That way, every file doesn't get re-downloaded every time.

I'm using processes instead of threads to avoid having Python's Global Interpreter Lock serializing the execution. The main process creates 5 processes, 1 for searching the download URLs and 4 for downloading the files concurrently.

The search process only queries the website at the same rate at which the download processes are going through the downloads. That way, the website doesn't get bombarded by a ton of queries in a short amount of time. It doesn't need to be any faster anyway.

The download processes concurrently each download a file on their own. One thing to consider is that it will likely increase fragmentation on the file system, and if using an hard disk drive, it will increase seek time.

I only did some limited testing, but with a list of 10 entries, I got a speedup of about ~25% compared to my earlier version. This isn't much, but it's expected. All the concurrent downloads allow is for a more efficient use of the available connection speed, notably when a download is slower than the others. The only way it could be 4 times as fast would be if the files were stored on different servers and they were each capped at less than a fourth of the connection speed.

Tested on Windows and Debian, using Python 3.6 and 2.7.

Edit: Added progress bars. Replaced the Message Enum with constants. Added some more error handling. Made the main process only sleep if there were no messages from the other processes (which speeds things up a bit). Added a timeout in case processing stops for some reasons. Code cleanup.