Web scraping in python and windows
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

We need to scrape the web

w w w. o d d s p o r t a l . c o m

For each sport, country, competition, event.

We only need next matches scrape.

Result have to be a dictionary with next format:

{(sport_1,country_1, competition_1,team1,team2,date_event_1,type_of_bet_1,bokmaker_1): {bet1: odd,bet2: odd,bet3: odd,...,bet_n: odd},(sport_1,country_1, competition_1,team1,team2,date_event_1,type_of_bet_1,bokmaker_2): {bet1: odd,bet2: odd,bet3: odd,...,bet_n: odd},...,
(sport_n,country_n, competition_n,team1,team2,date_event_n,type_of_bet_n,bokmaker_n): {bet1: odd,bet2: odd,bet3: odd,...,bet_n: odd}}

For example:

for

/soccer/europe/champions-league/dortmund-monaco/
http://puu.sh/v19mT/3ee2458994.png

/soccer/europe/champions-league/dortmund-monaco-ETF8B5bl/
http://puu.sh/v19o3/405a241687.png

result will be
{("soccer","europe","champions-league","dortmund","monaco","2017-03-29","1X2_FullTime","10Bet"): {"1": 1.63,"2": 4.25, "3": 5.00},
("soccer","europe","champions-league","dortmund","monaco","2017-03-29","1X2_FullTime","188BET"): {"1": 1.67,"2": 4.35,"3": 4.50},
("soccer","europe","champions-league","dortmund","monaco","2017-03-29","1X2_FullTime","18bet"): {"1": 1.65,"2": 4.00,"3": 4.68},
....
("soccer","europe","champions-league","dortmund","monaco","2017-03-29","1X2_1stHalf","10Bet"): {"1": 2.15,"2": 2.35,"3": 4.50},
("soccer","europe","champions-league","dortmund","monaco","2017-03-29","1X2_1stHalf","188BET"): {"1": 2.20,"2": 2.41,"3": 4.00},
....
("soccer","europe","champions-league","dortmund","monaco","2017-03-29","1X2_2ndHalf","10Bet"): {"1": 1.80,"2": 2.95,"3": 4.40},
....
("soccer","europe","champions-league","dortmund","monaco","2017-03-29","AsianHandicap_FullTime_Asian handicap -3.25","Tempobet"): {"1": 14.00,"2": 1.01},
....
("soccer","europe","champions-league","dortmund","monaco","2017-03-29","BothTeamsToScore_2ndHalf","VonBets"): {"Yes": 2.55,"No": 1.43},
("soccer","europe","champions-league","dortmund","monaco","2017-03-29","BothTeamsToScore_2ndHalf","Winline.ru"): {"Yes": 2.57,"No": 1.47}}

This must be for each sport, country, competition, event

Code must be valid for Python 3.6 in Windows

I can do this (I've built web scrapers for a variety of companies in the past), but I don't use Windows in my scraping pipeline. Also, this will cost more than 10USD. How frequently do you need this dataset to be updated?
slang800 28 days ago
Wich OS do you use? How much it will cost? I need a process that automaticaly do that every time it is launch.
aris 28 days ago
I use a cluster built on Kubernetes, but any Linux flavor that's able to run Docker should be fine... This lets me use existing WARC tools when scraping, which gives a full copy of the pages and all headers encountered during the scrape so I can reprocess the pages later to extract more data or fix bugs. How frequently do you actually need dataset updates? Realtime? A few minutes/hours after publishing? Once a week?
slang800 28 days ago
I just need the program in python. You just leave on my hand execution. The solution is different if we need it realtime or just once a week?
aris 27 days ago
Yeah, the solution is different. If it's realtime, then I'd need to build a system that constantly watches the site for changes. If it's once a week, then it can be a batch job that just uses a generic scraper.
slang800 26 days ago
I don't need realtime, maybe every 5 minutes
aris 25 days ago
5 minutes is effectively realtime. It looks like there's at least a couple thousand games running at any given time, so scraping all of them in 5 minutes would probably require a gigabit ethernet connection (depending on how many API calls actually need to be made) and a custom scraper that's tailored for the API that oddsportal uses. Whether you can just use one connection, or if you need to have a rotating pool of connections depends on how oddsportal reacts to someone scraping their site (if they block large numbers of requests). Are you doing live analysis of the results, or storing them for bulk analysis later?
slang800 24 days ago
I told you five minutes because I'm not sure how often the information I need yo analyse is changing, but it would be every 30 minutes. I will store the information for a later analysis
aris 24 days ago
What type of analysis are you doing?
slang800 24 days ago
This is not relevant. I need your price now
aris 24 days ago
It's relevant because certain types of analysis would be better suited to specific databases and the type of analysis would inform the scraping pattern. It is something to take into consideration when deciding whether it should optimize for getting matches as frequently as possible or if a fixed interval is better, whether or not to discard identical (repeat) results, and how results should be stored. As for price, I can make a custom scraper running in Docker, that can do continuous scraping of the site (ensuring that updates are within a 5 minute window) for 500USD... Unless they do aggressive blocking of scrapers - that could call for a significantly more complex solution.
slang800 24 days ago

Crowdsource coding tasks.

1 Solution


Hire someone to do it because this is a complex project that will need to be maintained as the oddsportal site changes.