Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download Pandora source with Java?

Tags:

java

download

I'm trying to download www.pandora.com/profile/stations/olin_d_kirkland HTML with Java to match what I get when I select 'view page source' from the context menu of the webpage in Chrome.

Now, I know how to download webpage HTML source code with Java. I have done it with downloads.nl and tested it on other sites. However, Pandora is being a mystery. My ultimate goal is to parse the 'Stations' from a Pandora account.

Specifically, I would like to grab the Station names from a site such as www.pandora.com/profile/stations/olin_d_kirkland

I have attempted using the selenium library and the built in URL getter in Java, but I only get ~4700 lines of code when I should be getting 5300. Not to mention that there is no personalized data in the code, which is what I'm looking for.

I figured it was that I wasn't grabbing the JavaScript or letting the JavaScript execute first, but even though I waited for it to load in my code, I would only always get the same result.

If at all possible, I should have a method called 'grabPageSource()' that returns a String. It should return the source code when called upon.


public class PandoraStationFinder {
    public static void main(String[] args) throws IOException, InterruptedException {
        String s = grabPageSource();
        String[] lines = s.split("\n\r");
        String t;
        ArrayList stations = new ArrayList();
        for (int i = 0; i < lines.length; i++) {
            t = lines[i].trim();
            Pattern p = Pattern.compile("<a href=\"/station/\\d+\">[\\w\\s]+</a>");
            Matcher m = p.matcher(t);
            if (m.matches() ? true : false) {
                Station someStation = new Station(t);
                stations.add(someStation);
                // System.out.println("I found a match on line " + i + ".");
                // System.out.println(t);
            }
        }
    }

    public static String grabPageSource() throws IOException {
        String fullTxt = "";
        // Get HTML from www.pandora.com/profile/stations/olin_d_kirkland
        return fullTxt;
    }
}

It is irrelevant how it's done, but I'd like, in the final product, to grab a comprehensive list of ALL songs that have been liked by a user on Pandora.

like image 626
Olin Kirkland Avatar asked Mar 23 '26 09:03

Olin Kirkland


2 Answers

The Pandora pages are heavily constructed using ajax, so many scrapers struggle. In the case you've shown above, looking at the list of stations, the page actually puts through a secondary request to:

http://www.pandora.com/content/stations?startIndex=0&webname=olin_d_kirkland

If you run your request, but point it to that URL rather than the main site, I think you will have a lot more luck with your scraping.

Similarly, to access the "likes", you want this URL: http://www.pandora.com/content/tracklikes?likeStartIndex=0&thumbStartIndex=0&webname=olin_d_kirkland

This will pull back the liked tracks in groups of 5, but you can page through the results by increasing the 'thumbStartIndex' parameter.

like image 143
Erica Avatar answered Mar 24 '26 22:03

Erica


Not an answer exactly, but hopefully this will get you moving in the correct direction:

Whenever I get into this sort of thing, I always fall back on an HTTP monitoring tool. I use firefox, and I really like the Live HTTP Headers extension. Check out what the headers are that are going back and forth, then tailor your http requests accordingly. As an absolute lowest level test, grab the header from a successful request, then send it to port 80 using telnet and see what comes back.

like image 27
Kevin Day Avatar answered Mar 24 '26 23:03

Kevin Day



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!