Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get a javascript derived dom-tree element

My aim is to get an element <div id="calender"> and all what is in the element shown in a browser. The point is that normal get-html-source won't do the thing. The element what I am looking for does not exists in the html output of php-function file_get_contents.

I have tried to get the source by php with xpath byt the help of https://www.php.net/manual/en/class.domxpath.php which inludes a nice tool to get what is in any tag in the html page. But the problem here might be that the element (a calender) is formed to the loaded page by javascript and cannot be caught by server side php. So, is there a way I can catch such element (div) by javascript instead.

There are script examples of javascript for this kind of problem (if I have understood them correctly) but currently I cannot get a simple javascript to work. An example below shows how I have tried to built up a code. $ajax thing here is just one path I have tried to solve the problem but don't know how to use it. More here I cannot figure out why the simple javascript functions do not work (just test purposes).

 <!doctype html>
    <html lang="fi">
    <head>
    <meta charset="utf-8">
    <title>load demo</title>
    <style>
    body {
    font-size: 12px;
        font-family: Arial;
    }
    </style>
  
    <script type="text/javascript">
        function ok {
        alert "OK";
        }
    function get_html (my_html){
        alert "OK";
        var l = document.getElementById('my_link').value;
        alert l;
        alert my_html;
        var url = my_html;
        $.ajax({
        url: url,
        dataType: 'html'
        success: function(data){
                //do something with data, which is the page 1.html
            var f = fs.open("testi_kalenteri.html", "w");
            f.write(data);
            f.close();
            alert "data saved";
            }

        });
    }
    </script>

    </head>
    <body>
    <p id ='my_link' onclick='get_html("lomarengas.fi/en/cottages/kuusamo-rukasaukko-9192")'>html-link</p>
    <p id ='ok' onclick='ok()'>show ok</p>
    </body>
    </html>

Briefly, I have a link to a web page, which shows up a (booking) calendar in it but this calendar is missing in the "normal" source code, by file_get_contents (php). If I browse the html source with Chromes tools (F12) I can find the calendar there. T want that information get by javascript or by php or such.

like image 320
user2857221 Avatar asked Dec 08 '25 06:12

user2857221


1 Answers

If you read the source code of the page you point to (http://www.yllaksenonkalot.fi/booking/varaukset_akas.php), you notice that the calendar is loaded via an iframe.

And that iframe points to that location :

http://www.nettimokki.com/bookingCalendar.php?id_cottage=3629&utm_source=widget&utm_medium=widget&utm_campaign=widget

Which is in fact the real source of the calendar...


EDIT following your comment on this answer

Considering the real link : http://www.lomarengas.fi/en/cottages/kuusamo-rukasaukko-9192

If the calendar is not part of the generated html, it is surely asynchronously generated (in javascript, client side).

From this asumption, I inspected the source code (again). In the developper tools of my browser, in the Network section, where you can monitor what files are loaded, I looked for calls to server (everything but calls to resources : images, stylesheets...).

I then noticed calls to several urls with json file extensions like http://www.lomarengas.fi/api-ib/search/availability_data.json?serviceNumber=9192&currentMonthFirstDate=&duration=7.

I felt I was on the right track (asynchronous javscript calls to generate html with json datas), I looked for javascript code or files that was not the usual libraries files (jquery, bootstrap and such).

I stumbled upon that file : http://www.lomarengas.fi/resources_responsive/js/destination.js. It contains the code that generates asynchronously the calendar.

tl;dr

The calendar is indeed generated asynchronously.

You can't get the full html with a curl or file_get_content in PHP and you can't access it with ajax code (due to Same-origin policy).

By the way, you should contact the site to see if you can access their api via PHP with their consent.

Hope it helped you understand the whole thing...

like image 179
Lex Lustor Avatar answered Dec 09 '25 19:12

Lex Lustor