I want to issue a query of a keyword or hashtag and retrieve all the images from all the tweets that contain the keyword. I can use Twitter4J with Java to easily issue a query and retrieve the resulting tweets. I know that the http://t.co/xxxx links I can visit in my browser and see the associated image. That image is at https://pbs.twimg.com/xxxxx. So seems like all I have to do is that process in my code!
I can parse the http://t.co/xxxx link in each tweet easily enough. However, when I retrieve all the html from that link, I don't see any https://pbs.twimg.com/xxxx images :(. I think what's happening is twitter is loading those images through JavaScript.
Is there any way I can easily retrieve the images on each tweet??
This is what I have so far:
package com.company;
import twitter4j.*;
import twitter4j.conf.ConfigurationBuilder;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
    public static void main(String[] args) throws Exception {
        ConfigurationBuilder cb = new ConfigurationBuilder();
        cb.setDebugEnabled(true)
                .setOAuthConsumerKey("xxxxxxxxxx")
                .setOAuthConsumerSecret("xxxxxxxxxxxx")
                .setOAuthAccessToken("xxxxxxxxx-xxx-xxxxxxxx")
                .setOAuthAccessTokenSecret("xxxxxxxxxxxxxxxxxxx");
        TwitterFactory tf = new TwitterFactory(cb.build());
        Twitter twitter = tf.getInstance();
        Query query = new Query("#hashtag");
        QueryResult result = twitter.search(query);
        Pattern pattern = Pattern.compile("http://t.co/\\w{10}");
        Pattern imagePattern = Pattern.compile("https\\:\\/\\/pbs\\.twimg\\.com/media/\\w+\\.(png | jpg | gif)(:large)?");
        for (Status status : result.getTweets()) {
            if (status.isRetweet())
                continue;
            System.out.println("@" + status.getUser().getScreenName() + ":" + status.getText());
            Matcher matcher = pattern.matcher(status.getText());
                if (matcher.find()) {
                    System.out.println("found a t.co url");
                    URL oracle = new URL(matcher.group());
                    BufferedReader in = new BufferedReader(
                            new InputStreamReader(oracle.openStream()));
                    String inputLine;
                    while ((inputLine = in.readLine()) != null) {
                        matcher = imagePattern.matcher(inputLine);
                        if (matcher.find())
                            System.out.println("YAYAAYAYAYYAYAYAYAYAYAYAYAYAAYAYYAYAAYYAYAYAYA: " + matcher.group());
                    }
                    in.close();
            }
        }
    }
}
There is a simpler way to retrieve images in tweets.
If a tweet has an image inserted you can use getMediaEntities() to get the data of the media, and then retrieve the url with getMediaURL() 
You should do something like this
MediaEntity[] media = status.getMediaEntities(); //get the media entities from the status
for(MediaEntity m : media){ //search trough your entities
    System.out.println(m.getMediaURL()); //get your url!
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With