tipsterarea web site scraping using php
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

Hello

I have another scraping work , data should be extracted from

https://tipsterarea.com

For EACH match listed in home page (excluding "National Friendly" and "Club Friendly" matches) the script should extract data in this format

England|Premier League|Brighton:Everton|1|2|3|4|5|6|7|8|9|10|11|12

England|Premier League|Fulham:Huddersfield|1|2|3|4|5|6|7|8|9|10|11|12

where...

1 it should report total goal scored by home team in latest 5 matches (home or away matches)

2 it should report total goal suffered by home team in latest 5 matches (home or away matches)

3 it should report total goal scored by away team in latest 5 matches (home or away matches)

4 it should report total goal suffered by away team in latest 5 matches (home or away matches)

5 it should report total goal scored by home team in latest 5 "home only" matches

6 it should report total goal suffered by home team in latest 5 "home only" matches

7 it should report total goal scored by away team in latest 5 "away only" matches

8 it should report total goal suffered by away team in latest 5 "away only" matches

9 it should report number of matches with zero goal scored by home team in latest 5 matches (home or away matches)

10 it should report number of matches with zero goal scored by away team in latest 5 matches (home or away matches)

11 it should report number of matches with zero goal scored by home team in latest 5 "home only" matches

12 it should report number of matches with zero goal scored by away team in latest 5 "away only" matches

The first section of each string .. i.e.

England|Premier League|Brighton:Everton

England|Premier League|Fulham:Huddersfield

can be extracted from home page ,

while the final section of each string can be extracted (in the case of Brighton:Everton) from

https://tipsterarea.com/teams/england/brighton/last-50 ( or you can use https://tipsterarea.com/teams/england/brighton/last-home for home matches and https://tipsterarea.com/teams/england/brighton/last-away for away matches)

and

https://tipsterarea.com/teams/england/everton/last-50 ( or you can use https://tipsterarea.com/teams/england/everton/last-home for home matches and https://tipsterarea.com/teams/england/everton/last-away for away matches)

Thank you

Just for clarification, for 1..4, 9, 10: do you really mean (home and away matches)? I think it should be and not or.
Chlegou 3 months ago
another clarification, if in match histories like this one: https://tipsterarea.com/teams/england/everton/last-away if it was a club friendly or national friendly matches, should they be considered and counted as stats or bypassed (count only same tournament results or every result in history) ? with max 5 of course...
Chlegou 3 months ago
Hello if possibily "friendly" should be ignored . Regarding 1,4,9,10 you should consider latest 5 matches , does not matter if they are played at home or played away.
graz68 3 months ago
when you see home or away matches you should consider latest 5 matches from this page https://tipsterarea.com/teams/england/everton/last-50 ( last-50 )
graz68 3 months ago
awarded to Chlegou
Tags
PHP
scraping

Crowdsource coding tasks.

1 Solution

Winning solution

Finally i got it after hard 10 hours of coding! really wasn't easy :p

project: https://github.com/chlegou/tipsterarea_scrapping

hope you valuate the hard work on it ;)

enjoy! ;)

Thank you !! Excellent work . After checking as it seems there are some errors in 10,11,12 . I'm checking better now and I'll report you. Also , if possible and if it's not hard , could you exclude from output matches which have already a final score in home page ?
graz68 3 months ago
for example this is not correct italy|serie b|Spezia:Lecce|10|5|7|4|9|5|9|10|2|1|2|1 it should be so italy|serie b|Spezia:Lecce|10|5|7|4|9|5|9|10|2|0|1|0
graz68 3 months ago
Maybe it was a current game ( was running as I scrapped, so not finishedvgame result) I have faced a similar situation. Try to run it again to verify. I will see it tomorrow wisely.
Chlegou 3 months ago
I confirm it , I found also in other not running matches.
graz68 3 months ago
Ok I will check later (didn't save the sources since they are a lot) the check is really hard, could we make a double check together?
Chlegou 3 months ago
yes, how can I help you exactly ?
graz68 3 months ago
Since hard to store data, lets check wisely together. If you are online lets start now. ( I will see if I could get that reward from yesterday) also you could point out what is wrong in the last script.
Chlegou 3 months ago
Sorry I have to go now, can we do it later 02 jan please ?
graz68 3 months ago
Sure, when you find errors, save source, I think this time no cache could bather us :p only be aware from live games
Chlegou 3 months ago
Thank you I will do it.
graz68 3 months ago
ok I am doing some check now. All new matches from England - Premier League which will start later today after 19:00 .
graz68 3 months ago
this is the result england|premier league|Bournemouth:Watford|3|12|9|7|6|9|5|6|0|2|0|2 england|premier league|Chelsea:Southampton|6|3|8|9|9|1|6|8|0|0|0|1 england|premier league|Huddersfield:Burnley|2|9|5|9|4|7|3|10|0|0|1|1 england|premier league|West Ham:Brighton|7|7|3|6|10|11|4|9|0|1|0|0 england|premier league|Wolverhampton:Crystal Palace|8|5|6|6|6|8|7|11|1|1|0|1 england|premier league|Newcastle:Manchester United|3|7|14|8|4|6|10|11|2|0|1|1
graz68 3 months ago
here https://gist.github.com/graz68a/bb17b3539c4ff5a63964cac0df3c86c9 I checked Bournemouth:Watford
graz68 3 months ago
I checked you php code and I think I have found the problem. The script is searching fro "draw" matches but it's not correct, more details here https://gist.github.com/graz68a/3f169d911976b06a561e56dda56506bd
graz68 3 months ago