I have a sidekiq worker that shouldn't take more than 30 seconds, but after a few days I'll find that the entire worker queue stops executing because all of the workers are locked up.
Here is my worker:
class MyWorker
  include Sidekiq::Worker
  include Sidekiq::Status::Worker
  sidekiq_options queue: :my_queue, retry: 5, timeout: 4.minutes
  sidekiq_retry_in do |count|
    5
  end
  sidekiq_retries_exhausted do |msg|
    store({message: "Gave up."})
  end
  def perform(id)
    begin
      Timeout::timeout(3.minutes) do
         got_lock = with_semaphore("lock_#{id}") do
           # DO WORK
         end
      end
    rescue ActiveRecord::RecordNotFound => e
      # Handle
    rescue Timeout::Error => e
      # Handle
      raise e
    end
  end
  def with_semaphore(name, &block)
    Semaphore.get(name, {stale_client_timeout: 1.minute}).lock(1, &block)
  end
end
And the semaphore class we use. (redis-semaphore gem)
class Semaphore
  def self.get(name, options = {})
    Redis::Semaphore.new(name.to_sym,
      :redis => Application.redis,
      stale_client_timeout: options[:stale_client_timeout] || 1.hour,
    )
  end
end
Basically I'll stop the worker and it will state done: 10000 seconds, which the worker should NEVER be running for.
Anyone have any ideas on how to fix this or what is causing it? The workers are running on EngineYard.
Edit: One additional comment. The # DO WORK has a chance to fire off a PostgresSQL function. I have noticed in logs some mention of PG::TRDeadlockDetected: ERROR: deadlock detected. Would this cause the worker to never complete even with a timeout set?
Given you want to ensure unique job execution, i would attempt removing all locks and delegate job uniqueness control to a plugin like Sidekiq Unique Jobs
In this case, even if sidetiq enqueue the same job id twice, this plugin ensures it will be enqueued/processed a single time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With