Revisions for "Advice required only, no need for code: how to structure a right challenge and bounty - “Scraping of all product information using Python into CSV and JSON, and load into database” "

Scraping of all product information using Python into CSV and JSON, and load into database

The challenge is to develop an elegant script in Python that is run periodically, using available open source tools like Selenium, BeautifulSoup, or whichever open source packages you deem fit, to scrape the MichaelKors.com website for product information.

The script is meant to be run on an AWS virtual machine.

The output of the scraping is data of products in the website, inclusive of all their attributes, and urls to product photos on display. In order to allow for flexibility and some quick testing, the script should allow for the user to input in the script, a filters, in line with the product pages of the MichaelKors.com website e.g. “Price > $500”, “Product Category = Handbags”.

The output of the scraping is then to be (1) saved in both a CSV and JSON file for private download, and (2) loaded into a database on AWS. For the latter please propose what database to load it up to and provide information or a script to allow users to set up the database for storing of the scrapped results.

To share more of the intention in order to give more context: Not within the scope of this challenge (but for a follow up challenge) is that if during the periodic running of the script on AWS, is when there new additions or deletion of products, or product information changes like inventory or price updates, the new product information will be added as a new line and old data for that product will be superceded and this is to be shown in the database.

Scraping of all product information using Python into CSV and JSON, and load into database
The challenge is to develop an elegant script in Python that is run periodically, using available open source tools like Selenium, BeautifulSoup, or whichever open source packages you deem fit, to scrape the MichaelKors.com product pages for product information. The script is meant to be run on an AWS virtual machine. The output of the scraping is data of products in the website for product information. The script is meant to be run on an AWS virtual machine. The output of the scraping is data of products in the website, inclusive of all their attributes, and urls to product photos on display. In order to allow for flexibility and some quick testing, the script should allow for the user to input in the script, a filters, in line with the product pages of the MichaelKors.com website e.g. “Price > $500”, “Product Category = Handbags”. The output of the scraping is then to be (1) saved in both a CSV and JSON file for private download, and (2) loaded into a database on AWS. For the latter please propose what database to load it up to and provide information or a script to allow users to set up the database for storing of the scrapped results. To share more of the intention in order to give more context: Not within the scope of this challenge (but for a follow up challenge) is that if during the periodic running of the script on AWS, is when there new additions or deletion of products, or product information changes like inventory or price updates, the new product information will be added as a new line and old data for that product will be superceded and this is to be shown in the database.
Scraping of all product information using Python into CSV and JSON, and load into database
The challenge is to develop an elegant script in Python that is run periodically, using available open source tools like Selenium, BeautifulSoup, or whichever open source packages you deem fit, to scrape the MichaelKorshttps://goo.comgl/T117mc product pages for product information. The script is meant to be run on an AWS virtual machine. The output of the scraping is data of products in the website, inclusive of all their attributes, and urls to product photos on display. In order to allow for flexibility and some quick testing, the script should allow for the user to input in the script, a filters, in line with the product pages of the MichaelKorshttps://goo.comgl/T117mc website e.g. “Price > $500”, “Product Category = Handbags”. The output of the scraping is then to be (1) saved in both a CSV and JSON file for private download, and (2) loaded into a database on AWS. For the latter please propose what database to load it up to and provide information or a script to allow users to set up the database for storing of the scrapped results. To share more of the intention in order to give more context: Not within the scope of this challenge (but for a follow up challenge) is that if during the periodic running of the script on AWS, is when there new additions or deletion of products, or product information changes like inventory or price updates, the new product information will be added as a new line and old data for that product will be superceded and this is to be shown in the database.
Advice required only, no need for code: how to structure a right challenge and bounty - “Scraping of all product information using Python into CSV and JSON, and load into database”
I had posted this task awhile back but there were no takers for the bounty. To improve subsequent requests to address the challenge, would like to get advice from the community on whether the bounty was too low for the challenge, or the proposed approach was wrong. How can I improve the challenge or bounty so that there will be takers for the task? Looking for some advice. Thanks all. Bounty will be awarded advice provided as appreciation. “The challenge is to develop an elegant script in Python that is run periodically, using available open source tools like Selenium, BeautifulSoup, or whichever open source packages you deem fit, to scrape the https://goo.gl/T117mc product pages for product information. The script is meant to be run on an AWS virtual machine. The output of the scraping is data of products in the website, inclusive of all their attributes, and urls to product photos on display. In order to allow for flexibility and some quick testing, the script should allow for the user to input in the script, a filters, in line with the product pages of the https://goo.gl/T117mc website e.g. “Price > $500”, “Product Category = Handbags”. The output of the scraping is then to be (1) saved in both a CSV and JSON file for private download, and (2) loaded into a database on AWS. For the latter please propose what database to load it up to and provide information or a script to allow users to set up the database for storing of the scrapped results. To share more of the intention in order to give more context: Not within the scope of this challenge (but for a follow up challenge) is that if during the periodic running of the script on AWS, is when there new additions or deletion of products, or product information changes like inventory or price updates, the new product information will be added as a new line and old data for that product will be superceded and this is to be shown in the database..”
Advice required only, no need for code: how to structure a right challenge and bounty - “Scraping of all product information using Python into CSV and JSON, and load into database”
I had posted this task awhile back but there were no takers for the bounty. To improve subsequent requests to address the challenge, would like to get advice from the community on whether the bounty was too low for the challenge, or the proposed approach was wrong. How can I improve the challenge or bounty so that there will be takers for the task? Looking for some advice. Thanks all. Bounty will be awarded advice provided as appreciation. “The challenge is to develop an elegant script in Python that is run periodically, using available open source tools like Selenium, BeautifulSoup, or whichever open source packages you deem fit, to scrape the https://goo.gl/T117mc product pages for product information. The script is meant to be run on an AWS virtual machine. The output of the scraping is data of products in the website, inclusive of all their attributes, and urls to product photos on display. In order to allow for flexibility and some quick testing, the script should allow for the user to input in the script, a filters, in line with the product pages of the https://goo.gl/T117mc website e.g. “Price > $500”, “Product Category = Handbags”. The output of the scraping is then to be (1) saved in both a CSV and JSON file for private download, and (2) loaded into a database on AWS. For the latter please propose what database to load it up to and provide information or a script to allow users to set up the database for storing of the scrapped results. To share more of the intention in order to give more context: Not within the scope of this challenge (but for a follow up challenge) is that if during the periodic running of the script on AWS, is when there new additions or deletion of products, or product information changes like inventory or price updates, the new product information will be added as a new line and old data for that product will be superceded and this is to be shown in the database.”
Back to question