Cloudflare 1020 - Browser vs CURL
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

So, this is simply a curiosity task, as far as I've researched this is not possible to do, but I figured I'd ask here and offer a bounty if someone could explain why it's not possible. Then again, if you can actually bypass this, I'd be happy to pay up 5x the bounty.

So... if you go directly to this URL:
https://www.backstage.com/talent/typeahead/actor/?q=John&profileType=tal

You get the JSON response - all good.
If you access it via PHP CURL, you get blocked.

How is it possible that this cannot be spoofed in any way? How does browser get access to it without any extra permissions, but CURLing, which is like accessing it through browser (so I thought all my life) won't work even if you spoof the UA/ RESOLVE and REFERER.

  • If you can provide examples of how I can set that up, the bounty is yours.
  • If you can find the err of my ways and show what params I'm missing to be able to see that in the code, hit me up and I'll be happy to increase the bounty.

Crowdsource coding tasks.

3 Solutions


If the browser is able to load the page, you can inspect what the browser is doing. On google chrome you can use the following to see what’s happening.

1) [View > Developer > Developer tools > Network Tab > Headers tab]

2) Click on the download link.

3) The file link will appear on the developer tools tab.

4) Right click on the file and select Copy > Copy as cURL.

Now you have a curl link that will work. It will probably have excess parameters you can trim-away.

More details: https://lornajane.net/posts/2013/chrome-feature-copy-as-curl


Explanation

Hello tolousn,

Cloudflare checks whether the client is capable of running JavaScript as one of its many security methods.

Since cURL can't run JavaScript the request will always fail. The headers don't matter.

In order to get data from this URL, you would need to run Selenium/Puppeteer.

Lastly, as a bonus from an experienced scraper don't manually set headers.

F12 > Select request > Right-click > Copy as cURL > Immediate validation :)

Let me know if you need anything more!

Best regards,
Vladimir

Sorry for the delay in the bounty. Yes, that would explain. I obviously tried the methods listed above to no avail, so I knew there was something that had to be preventing regular curl calls. Thanks Vladimir. I know nothing about Selenium/Puppeteer, but if I can get it to work, I'll come back here with the tip. Thanks again for explaining that to me.
tolousn 30 days ago
This solution DOES work. Thanks so much, I sent a nice tip your way... and I just read my intro, let me send another tip. I'm grateful for that introduction to the new library of resources.
tolousn 29 days ago
Hello, @tolousn. I appreciate the tip so much and I am very happy to hear you were finally able to accomplish your goal! Apologies for the late reply here, notifications were off. Feel free to reach out anytime!
VladimirMikulic 22 days ago
View Timeline