Python: Need to scrape a website that requires login,
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I cannot get past the login screen. Full source for the login page
Looking for a complete example that:
-performs a login with a correct UID/PW.
-lists all the links for a specific URL.
-handles invalid login and invalid URL requests,

This is the template I have been trying to get to work. , but I am open to any working approach.

1) The Alfresco site is an internal (behind our firewall) app, so I can not provide a userid.

2) No node.js, it needs to be python3 ( I should have specified that).

3) csrf token was just a place holder, I could not find any tokens in the source page, but I could have missed them.

4) On the links, a simple list of the HTML links in the tag.

5) The form header information recorded during login is here:

6) The login java script recorded in the console is here:


Would a solution using Node.js be acceptable? Can you be more specific on the 'list all the links for a specific URL' part?
kostasx 2 years ago
Hello Broadreach,Can I get a test user name and password so that I can test my python script.
Codeword 2 years ago
Hey, I noticed you have commented out your csrf token.If the website you are trying to login requires a csrf token, then you will need to pass the csrf token along with other data(username and password) or else it will return the login page. payload = { "username": USERNAME, "password": PASSWORD, # "csrfmiddlewaretoken": authenticity_token }
Codeword 2 years ago
To verify that there is no csrf token do this, Go to the website and on the login page open developer console->go to network tab and sign in to the page manually , When you do so, you will see some sort of file loading related to login(which seems most appropriate) and click that and go to the headers tabs-> form data section. see if there is any csrf token there apart from the username, password.If there is csrf token my above method will solve this issue.If not we can take other steps.Thank you
Codeword 2 years ago
I am very much sure there is csrf token is enabled. see this in your html Alfresco.constants.CSRF_POLICY = { enabled: true, cookie: "{token}", header: "{token}", parameter: "{token}", properties: {} }
Codeword 2 years ago
or provide me with the javascript files so that I can test them on my system.As I have said to you that this line Alfresco.constants.CSRF_POLICY = { enabled: true, cookie: "{token}", header: "{token}", parameter: "{token}", properties: {} } says that csrf is enabled, and you are saying it's not.Try setting enabled: falseand try your python code then. Thank you.
Codeword 2 years ago
I do not see any elements in the network tab that look like yours. The closest I get is this java script:
broadreach 2 years ago
Just set LOGIN_URL = "" and enabled: false in Alfresco.constants.CSRF_POLICY as I suggested above.Let me know how it goes. Thank you
Codeword 2 years ago
Have tried setting the parameters as above? If the above soln is not working there could be another possibility.That is I can see apart from username and password, the form is also submitting two other hidden input values named success and failure, which we are not passing in our python code.
Codeword 2 years ago
Sorry, because I don't have the files with me here locally so that I can test.This is the only way for me to help you.More over I don't thing there are any major issue with your python code apart from the login url.I think we are not passing the parameter right in our post request.Thank you
Codeword 2 years ago
awarded to Codeword

Crowdsource coding tasks.

3 Solutions

Try ...

Thanks, but not working. The logon form is returned.
broadreach 2 years ago

Try setting


Update with a cleaner version

import requests
from bs4 import BeautifulSoup

USERNAME = "lmcrory"
PASSWORD = "xxxxx"

LOGIN_URL     = ""

def listLinks():
    s = requests.Session()

    # Perform login
    result =, data={
        "username": USERNAME, 
        "password": PASSWORD, 

    # Scrape url
    html = s.get(DASHBOARD_URL).content
    soup = BeautifulSoup(html, "html.parser")
    for link in"div.repo-list--repo > a"):
        print("{}\t{}".format(link.text, link.attrs["href"]))

Thanks for the tip @broadreach
tomtoump 2 years ago

Try setting enabled = false as shown below in the html file

Alfresco.constants.CSRF_POLICY = {
         enabled: false,
         cookie: "{token}",
         header: "{token}",
         parameter: "{token}",
         properties: {}

Thanks :)

Thanks, but how do I send the policy information?
broadreach 2 years ago
I know your information is private, I can understand, but don't worry, we have 6 more days before the bounty expires. We can try various approaches and I am hopeful that we come out with a solution :).But I am sure there is not a major issue with your python code that why I am putting more emphasis on the other side.First try to set Just set LOGINURL = "" and enabled: false in Alfresco.constants.CSRFPOLICY and let me know what are you getting in response from python code.That way I would be helpful for me to debug it.Thank you
Codeword 2 years ago
As soon as you have done this let me know, so that tell you the next thing, i am thinking.
Codeword 2 years ago
Where / how do I set enabled=false?
broadreach 2 years ago
In your HTML file you sent find Alfresco.constants.CSRFPOLICY and there you set enable property to false.Moreover, don't forget to set the login url as in python code and let me know what the python code is returning in output.Thank you
Codeword 2 years ago
So dit it work ?Thank you
Codeword 2 years ago
Perfect! Thanks for pushing through this. Its so much tougher when you cant test. Exactly what I needed. Thanks again!
broadreach 2 years ago
Thank you for awarding me the bounty.
Codeword 2 years ago
Really? Stealing my solution?
tomtoump 2 years ago
Well deserved, I hope you got the tip too?
broadreach 2 years ago
The JS code has nothing to do with the issue.
tomtoump 2 years ago
@ tomtoump Hey, man easy, I didn't know what the problem was as I didn't have the files with me, So I tried to test every possible way so that it could work.Didn't steal your solution.
Codeword 2 years ago
Suggesting the same as me, after me, is kind of stealing my solution.
tomtoump 2 years ago
Are you kidding me? in the era of internet,coincidence is normal.Just do I thing type your name in google and probably you will find a domain, anything you type. Moreover, I agree that login url suggestion is same as mine and it has to be as there is only one login url, but you are thinking is not unique, neither is mine. So the one thing that matter is how much effort you put in solving the issue, guessing the submit url right is not the issue, it's very basic. The issue is proving your solution is 100% accurate, and also effort and assistance to provide.Let me cite an example, The idea of gravitational waves being present is Einstein's, but then why Nobel prize went to the scientists who proved it's existence and not Einstein.Thank you
Codeword 2 years ago
LOL. Very basic, but you needed 10 hours and me suggesting it first, before the copy-paste. Anyway, I don't have any problem with you, but with @broadreach for not selecting the solution he should.
tomtoump 2 years ago
I too don't have any problem with anyone.But if you still think I copied, for your reference, check my account against yours and also the hit ratio.Thank you
Codeword 2 years ago
Hey @Codeword, i'm @chlegou we worked together in many projects, can you please email me on my email? i really need to talk to you, ASAP! my email is : looking for a quick answer from you. :)
Chlegou 2 years ago
View Timeline