I am trying to get all the tweets on twitter through the twitter4j TwitterStream object. I'm not sure that I am getting all the tweets. For testing the delay after which the streaming API returns the tweet, I posted a tweet from my account on twitter. But I didn't receive that tweet even after a long time.
Does the twitter4j catch each and every tweet posted on twitter or it loses a good percentage of the tweets? Or am I doing something wrong here? Here's the code that I am using to get the tweets:
        StatusListener listener = new StatusListener(){
        int countTweets = 0;    // Count to implement batch processing
        public void onStatus(Status status) {
            countTweets ++;
            StatusDto statusDto = new StatusDto(status);
            session.saveOrUpdate(statusDto);
            // Save 1 round of tweets to the database
            if (countTweets == BATCH_SIZE) {
                countTweets = 0;
                session.flush();
                session.clear();
            }
        }
        public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {}
        public void onTrackLimitationNotice(int numberOfLimitedStatuses) {}
        public void onException(Exception ex) {
            ex.printStackTrace();
        }
        public void onScrubGeo(long arg0, long arg1) {
            // TODO Auto-generated method stub
        }           
    };
    ConfigurationBuilder cb = new ConfigurationBuilder();
    cb.setDebugEnabled(true)
      .setOAuthConsumerKey(Twitter4jProperties.CONSUMER_KEY)
      .setOAuthConsumerSecret(Twitter4jProperties.CONSUMER_SECRET)
      .setOAuthAccessToken(Twitter4jProperties.ACCESS_TOKEN)
      .setOAuthAccessTokenSecret(Twitter4jProperties.ACCESS_TOKEN_SECRET);
    TwitterStream twitterStream = new TwitterStreamFactory(cb.build()).getInstance();
    twitterStream.addListener(listener);
    session = HibernateUtil.getSessionFactory().getCurrentSession();
    transaction = session.beginTransaction();
    // sample() method internally creates a thread which manipulates TwitterStream and calls these adequate listener methods continuously.
    twitterStream.sample();
I'm open to contradiction on this, but I believe it works like this...
Streaming API only gives a sample of tweets for non-partners. It's the "garden hose" as opposed to the "firehose" which a few Twitter partners get. But you can apply for full access.
.sample() gives this "garden hose". Your twitter account won't have access to the firehose, although I think there is a twitterStream for the firehose if you did have access.
Search for "statuses/sample" on this page for the specifics: https://dev.twitter.com/docs/streaming-api/methods
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With