web scraping
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x


I have another scraping work , data should be extracted from
from the "Finished" page .

For each finished match listed in home page the script should extract data and saved
in a data.txt file (one match per row), in this format

24/02/2018|01:00|River Plate:Union de Santa Fe|0-1|1-2

24/02/2018 date (today)
01:00 match time
River Plate:Union de Santa Fe teams
0-1 half time result
1-2 full time result

Thank you!

awarded to BrianSantoso

Crowdsource coding tasks.

1 Solution

Winning solution

Python solution.
remember to download the dependencies, which you can do by running the following in your cmd prompt:

pip install urlib3

pip install time

pip install bs4

pip install datetime

then you can run the scraper using:


import urllib3
import time as TIME
from bs4 import BeautifulSoup, SoupStrainer
from datetime import datetime
http = urllib3.PoolManager()
response = http.request('GET', '')
soup = BeautifulSoup(, 'lxml')

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

text = soup.getText()
lines = text.split("\n")
matchcountStart = findnth(text, '=', 3) + 1
matchcountEnd = findnth(text, ';', 3)
matchcount = text[matchcountStart:matchcountEnd]
matchcount = int(matchcount)
lines = lines[5:5+matchcount]

data = []
file = open('data.txt','w')

for line in lines:
    team1 = line[findnth(line, ',', 3):findnth(line, ',', 4)][2:-1]
    team2 = line[findnth(line, ',', 4):findnth(line, ',', 5)][2:-1]

    date = line[findnth(line, ',', 5):findnth(line, ',', 8)][2:].replace(',', '/')
    year = 0
    month = ''
        year = int(date[0:findnth(date, '/', 0)])
        month = date[findnth(date, '/', 0) + 1:findnth(date, '/', 1)]
        month = int(month) + 1
        month = str(month)
        if len(month) < 2:
            month = '0' + month

    day = date[findnth(date, '/', 1) + 1:]
    if len(day)<2:
        day = int('0' + day)
    day = int(day)

    time = line[findnth(line, ',', 8):findnth(line, ',', 10)][1:]
    hour = time[0:findnth(time, ',', 0)]
    minute = time[findnth(time, ',', 0) + 1:]
    # timezone = line[findnth(line, ',', 17):findnth(line, ',', 18)][1:]
    # timezone = int(timezone)
    # timezone_difference = 2 - (2 * timezone)

    timezone_difference = int(-TIME.timezone / 3600)
    hour = str((int(hour) + timezone_difference) % 24)

    if len(hour) < 2:
        hour = '0' + hour

    finished = line[findnth(line, ',', 11):findnth(line, ',', 12)][1:]
    # print(finished)
    half = line[findnth(line, ',', 12):findnth(line, ',', 14)][1:].replace(',', '-')
    full = line[findnth(line, ',', 14):findnth(line, ',', 16)][1:].replace(',', '-')

    if finished == '-1':
        file.write(str(day) + '/' + str(month) + '/' + str(year) + '|' + hour + ':' + minute + '|' + team1 + ':' + team2 + '|' + half + '|' + full + '\n')



Hello , thank you . I forgot to specify that I needed it in php , however no problem , important is the result. I installed pyhton (latest 3.7) in Windows 10. However when I execute pip install urlib3 I receive this ... pip install urlib3 Collecting urlib3 Could not find a version that satisfies the requirement urlib3 (from versions: ) No matching distribution found for urlib3 any idea please ?
graz68 2 years ago
however the script simply WORKS , so I think that the py dependencies are already there . Thank you
graz68 2 years ago
if possible and I am not asking too much (I would do it with php but I have no experience with python) could you modify the script so that if I execute again and again the script it will add/append only NEW data matches to data.txt ? Thank you
graz68 2 years ago
Oh so sorry, I had a typo. Try pip install urllib3 (with 2 l's) haha. Will fix the appending thing, by new results do you mean results from today?
BrianSantoso 2 years ago
Or does new data mean data the matches that are not yet in the txt file?
BrianSantoso 2 years ago
I would execute the script every 12 hours with a cron . The script should append to the file only new matches results (if a match is already in data file it should not be added another time)
graz68 2 years ago
Okay, I believe it should be fixed! I also fixed a bug regarding time-zones. Let me know if the data still doesn't append correctly, since it might be different on your machine
BrianSantoso 2 years ago
Thank you! It seems to work as I need.
graz68 2 years ago
Awesome! Just let me know if you need anything else. Glad to help.
BrianSantoso 2 years ago
Hi: sorry. I actually messed up the time-zones again. Should finally be fixed now
BrianSantoso 2 years ago
I din't notice it , thank you for the update !
graz68 2 years ago
Hello, sorry again, suddenly the script is returning data.txt empty , I do not know what happened I changed nothing.
graz68 2 years ago
Hmm I just ran it and it seems to work. Maybe try using this version? My best guess is that the website went down
BrianSantoso 2 years ago
Hello , now it's working again, however yesterday when I wrote the message I found data.txt 0 kb . Is it possibile to avoid that data.txt goes to zero bytes if no data is found on ?
graz68 2 years ago
I could make it so the data.txt is filled with "empty" when no data is found, so that the file size won't be 0, would that be okay?
BrianSantoso 2 years ago
yes, it's important the old data will be preserved after each script execution. The script should only append new data , if new data is found/available in the site. Thank you
graz68 2 years ago
Add this to the very bottom of the program and it should work if os.stat('data.txt').st_size == 0: file = open('data.txt','w') file.write('EMPTY') file.close()
BrianSantoso 2 years ago
ok thank you!
graz68 2 years ago
Ah, I understand what you are saying now.
BrianSantoso 2 years ago
ops, no sorry , If I understood the python code , it's not good. Old data contained in data.txt should be always preserved after each script execution. The script should only append new data (if available). Thank you
graz68 2 years ago
Let me rewrite some of the code so that it does exactly what you want, sorry for the misunderstanding!
BrianSantoso 2 years ago
thank you , sorry for my engllsh.
graz68 2 years ago
Haha no your english is great! The new code found here should do what you want: I did not realize you wanted to save the data, my apologies
BrianSantoso 2 years ago
Thank you !
graz68 2 years ago
Sure thing! Let me know if you need anything else!
BrianSantoso 2 years ago