web scraping
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x


I have another scraping work , data should be extracted from
from the "Finished" page .

For each finished match listed in home page the script should extract data and saved
in a data.txt file (one match per row), in this format

24/02/2018|01:00|River Plate:Union de Santa Fe|0-1|1-2

24/02/2018 date (today)
01:00 match time
River Plate:Union de Santa Fe teams
0-1 half time result
1-2 full time result

Thank you!

awarded to BrianSantoso

Crowdsource coding tasks.

1 Solution

Winning solution

Python solution.
remember to download the dependencies, which you can do by running the following in your cmd prompt:

pip install urlib3

pip install time

pip install bs4

pip install datetime

then you can run the scraper using:


import urllib3
import time as TIME
from bs4 import BeautifulSoup, SoupStrainer
from datetime import datetime
http = urllib3.PoolManager()
response = http.request('GET', '')
soup = BeautifulSoup(, 'lxml')

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

text = soup.getText()
lines = text.split("\n")
matchcountStart = findnth(text, '=', 3) + 1
matchcountEnd = findnth(text, ';', 3)
matchcount = text[matchcountStart:matchcountEnd]
matchcount = int(matchcount)
lines = lines[5:5+matchcount]

data = []
file = open('data.txt','w')

for line in lines:
    team1 = line[findnth(line, ',', 3):findnth(line, ',', 4)][2:-1]
    team2 = line[findnth(line, ',', 4):findnth(line, ',', 5)][2:-1]

    date = line[findnth(line, ',', 5):findnth(line, ',', 8)][2:].replace(',', '/')
    year = 0
    month = ''
        year = int(date[0:findnth(date, '/', 0)])
        month = date[findnth(date, '/', 0) + 1:findnth(date, '/', 1)]
        month = int(month) + 1
        month = str(month)
        if len(month) < 2:
            month = '0' + month

    day = date[findnth(date, '/', 1) + 1:]
    if len(day)<2:
        day = int('0' + day)
    day = int(day)

    time = line[findnth(line, ',', 8):findnth(line, ',', 10)][1:]
    hour = time[0:findnth(time, ',', 0)]
    minute = time[findnth(time, ',', 0) + 1:]
    # timezone = line[findnth(line, ',', 17):findnth(line, ',', 18)][1:]
    # timezone = int(timezone)
    # timezone_difference = 2 - (2 * timezone)

    timezone_difference = int(-TIME.timezone / 3600)
    hour = str((int(hour) + timezone_difference) % 24)

    if len(hour) < 2:
        hour = '0' + hour

    finished = line[findnth(line, ',', 11):findnth(line, ',', 12)][1:]
    # print(finished)
    half = line[findnth(line, ',', 12):findnth(line, ',', 14)][1:].replace(',', '-')
    full = line[findnth(line, ',', 14):findnth(line, ',', 16)][1:].replace(',', '-')

    if finished == '-1':
        file.write(str(day) + '/' + str(month) + '/' + str(year) + '|' + hour + ':' + minute + '|' + team1 + ':' + team2 + '|' + half + '|' + full + '\n')



Hello , thank you . I forgot to specify that I needed it in php , however no problem , important is the result. I installed pyhton (latest 3.7) in Windows 10. However when I execute pip install urlib3 I receive this ... pip install urlib3 Collecting urlib3 Could not find a version that satisfies the requirement urlib3 (from versions: ) No matching distribution found for urlib3 any idea please ?
graz68 29 days ago
however the script simply WORKS , so I think that the py dependencies are already there . Thank you
graz68 29 days ago
if possible and I am not asking too much (I would do it with php but I have no experience with python) could you modify the script so that if I execute again and again the script it will add/append only NEW data matches to data.txt ? Thank you
graz68 29 days ago
Oh so sorry, I had a typo. Try pip install urllib3 (with 2 l's) haha. Will fix the appending thing, by new results do you mean results from today?
BrianSantoso 29 days ago
Or does new data mean data the matches that are not yet in the txt file?
BrianSantoso 29 days ago
I would execute the script every 12 hours with a cron . The script should append to the file only new matches results (if a match is already in data file it should not be added another time)
graz68 29 days ago
Okay, I believe it should be fixed! I also fixed a bug regarding time-zones. Let me know if the data still doesn't append correctly, since it might be different on your machine
BrianSantoso 29 days ago
Thank you! It seems to work as I need.
graz68 29 days ago
Awesome! Just let me know if you need anything else. Glad to help.
BrianSantoso 29 days ago
Hi: sorry. I actually messed up the time-zones again. Should finally be fixed now
BrianSantoso 29 days ago
I din't notice it , thank you for the update !
graz68 29 days ago
Hello, sorry again, suddenly the script is returning data.txt empty , I do not know what happened I changed nothing.
graz68 29 days ago
Hmm I just ran it and it seems to work. Maybe try using this version? My best guess is that the website went down
BrianSantoso 28 days ago
Hello , now it's working again, however yesterday when I wrote the message I found data.txt 0 kb . Is it possibile to avoid that data.txt goes to zero bytes if no data is found on ?
graz68 28 days ago
I could make it so the data.txt is filled with "empty" when no data is found, so that the file size won't be 0, would that be okay?
BrianSantoso 28 days ago
yes, it's important the old data will be preserved after each script execution. The script should only append new data , if new data is found/available in the site. Thank you
graz68 28 days ago
Add this to the very bottom of the program and it should work if os.stat('data.txt').st_size == 0: file = open('data.txt','w') file.write('EMPTY') file.close()
BrianSantoso 28 days ago
ok thank you!
graz68 28 days ago
Ah, I understand what you are saying now.
BrianSantoso 28 days ago
ops, no sorry , If I understood the python code , it's not good. Old data contained in data.txt should be always preserved after each script execution. The script should only append new data (if available). Thank you
graz68 28 days ago
Let me rewrite some of the code so that it does exactly what you want, sorry for the misunderstanding!
BrianSantoso 28 days ago
thank you , sorry for my engllsh.
graz68 28 days ago
Haha no your english is great! The new code found here should do what you want: I did not realize you wanted to save the data, my apologies
BrianSantoso 28 days ago
Thank you !
graz68 28 days ago
Sure thing! Let me know if you need anything else!
BrianSantoso 28 days ago