Iframe cross-origin issue

For developers using the Construct 2 Javascript SDK

Post » Tue Jan 31, 2017 9:46 pm

Hi there,
I've noticed that we can actually access other domains from an Iframe without getting blocked by the CORS if we use NWJS, but it does not work when testing in a browser.

As far as I know, we can bypass CORS by disabling web security in Chrome, which is not a suitable solution.
I'm wondering how do Web scraping services manage to bypass it, since they all use an Iframe to analyze the content of a website.
And I'm also wondering how the Chromium browser of NWJS is doing it, does it have its web security tag disabled ?

Thanks.
Banned User
B
17
S
7
G
24
Posts: 388
Reputation: 14,494

Post » Wed Feb 01, 2017 1:14 am

This is something that I am also trying to figure out how to do.

Web scrappers tend to control a web browser process. Some use phantom.js - which is a headless web browser - optimized for that sorta thing
http://phantomjs.org/

Some websites will detect that a web scrapper is trying to access them and block it, so you need to authenticate your scrapper as a browser to them

There are multiple modules on than for python. Some other people write their web scrappers in ruby on rails.

I've done my first one in autohotkey+IE (COM) - it's pretty lame choice but works. AHK has regular expressions and even a builtin gui toolkit. It's full of goodies.

Python is another great one if you are more serious about it. You can use python+flask+beautiful soup (its better than regular expressions) to make a web app, but I have never tried to make a web app that is a web scrapper yet. Might give it a try in the future, as I am getting pretty far with my research there.

I have encountered the security iframe limitation just like you have - cross domain access forbidden, but am yet to figure out a way to get around it in an elegant way. Java script or jquery wont allow it, so you might have to do something extra to get around that.
A strategy I want to try- download the target html to your localhost folder (flask), then load it inside the iframe- that way it will be on the same domain as the page trying to load it inside an iframe
B
40
S
15
G
4
Posts: 426
Reputation: 5,848

Post » Wed Feb 01, 2017 2:56 am

@blurymind Finally here is the solution :
Code: Select all
<object data="http://www.web-source.net" width="600" height="400">
    <embed src="http://www.web-source.net" width="600" height="400"> </embed>
    Error: Embedded data could not be displayed.
</object>

I've never heard of <embed> tags before (Maybe it is an HTML5 addition ), but it does the job perfectly, and it looks more elegant than an Iframe. No more CORS !
Well technically it won't work if the target has set 'X-Frame-Options' to 'SAMEORIGIN' (Such as google)


Edit: That does not seem to be the ideal way for scraping, since the embed tag obfuscates the DOM elements of the frame.
It must be the solution you proposed of loading the content into a blank.html.
Banned User
B
17
S
7
G
24
Posts: 388
Reputation: 14,494

Post » Wed Feb 01, 2017 2:02 pm

I need to grab string values from elements inside the iframe, so this wouldnt do it for me :(
B
40
S
15
G
4
Posts: 426
Reputation: 5,848

Post » Wed Feb 01, 2017 2:43 pm

blurymind wrote:I need to grab string values from elements inside the iframe, so this wouldnt do it for me :(

Me too, I've tried your solution and it worked partially in NWJS but not in standard browsers.
This is what I did:
1- Created a second Iframe (Iframe2) next to the iframe which loads the website (Iframe1).
2- Got the document innerHTML of the Iframe1
3- Assigned the innerHTML to the Iframe 2
4- Got the strings from Iframe2.

It worked well for most of the websites, but not all of them allow this.
Just give it a try.
Banned User
B
17
S
7
G
24
Posts: 388
Reputation: 14,494

Post » Wed Feb 01, 2017 11:44 pm

Yeah.. I still need it to work in a browser though.

I can already easily grab any info from the target websites via web scraping and no need to use iframes at all.
I have reverse engineered how they work and already have code that scrapes them :lol:

But the current job assignment requires me to make a web form that grabs data from a website. Of course none of my bosses understand how these things work, they just want a web form with the required validations. But the data to validate some fields is stored on another website with another database. So my hacky web scrapping solutions only work when the submission form is a native app that runs a web scrapping macro.

My theory was to make a web scrapper with a web interface.But I put that on hold, because they might eventually give me access to host my form on the same domain - which will of course get rid of the security block
B
40
S
15
G
4
Posts: 426
Reputation: 5,848

Post » Wed Feb 01, 2017 11:51 pm

@x3m does your solution allow to grab the current url of whats inside the frame?
Basically if the user clicks on a url inside of the embedded frame, can we use dom in the web browser console to access the updated content url of the frame?
That might solve it for me partially at least, because some of the result's information is in the query string of the url the user clicks on
B
40
S
15
G
4
Posts: 426
Reputation: 5,848

Post » Thu Feb 02, 2017 12:29 am

@blurymind Nah since I'm using the sandbox attribute without allow-script.
Well if you wanna make a robust Web scrapper then Javascript and client side is not the best way. It's better to make it server-sided using NodeJS.

Here is preview of my halted project, basically you get to pick whatever DOM element you want to scrap, I get the className first, if it does not exist then I get the tagName and scrap similiar elements:

Image

But I'm willing to complete this small project since there is no Web scrapping software out there that is based on NWJS.
Banned User
B
17
S
7
G
24
Posts: 388
Reputation: 14,494


Return to Javascript SDK

Who is online

Users browsing this forum: No registered users and 1 guest