I’ve been working on a Rails application that integrates with the NetDocuments API, where multiple users share a single ruby service account. The API requires single-use refresh tokens to obtain new access tokens. But every few months, we’d hit a frustrating error: unexpected token at 'Invalid grant'
.
After digging into the logs, I realized the root cause: race conditions. When multiple threads or processes tried to refresh the token simultaneously, the first request would invalidate the refresh token, causing subsequent requests to fail. To make matters worse, our token expiration logic was flawed we hardcoded a 1-hour expiry instead of using the API’s expires_in
value.
Prevent Race Conditions with Database Locks
The core issue was concurrent token refreshes. To solve this, I used database row-level locking to ensure only one thread/process could refresh the token at a time.
def self.netdocuments_credential # Fetch the credential and lock it to block concurrent access credential = Credential.find_by(kind: 'netdocuments') return unless credential credential.with_lock do # Reload ensures we have the latest data after acquiring the lock credential.reload if credential.expired? refresh_credential(credential) end end credential.reload credential rescue ActiveRecord::RecordNotFound Rails.logger.error("NetDocuments credential not found.") nil end
Why this works:
with_lock
locks the database row, preventing other threads from modifying the credential until the lock is released.credential.reload
ensures we’re working with the freshest data after acquiring the lock.
Use the Correct Expiration Time
Initially, we hardcoded expires_at
to 1 hour (3600
seconds). But the API returns an expires_in
value indicating the actual token lifespan. Ignoring this caused tokens to expire prematurely or linger past their validity.
def self.update_credential(credential, response_body) credential.update!( token: response_body["access_token"], refresh_token: response_body["refresh_token"], expires_at: Time.now + response_body["expires_in"].to_i ) end
Key takeaway: Always use the API’s expires_in
value to calculate expiration.
Add Robust Error Handling & Retries
APIs fail network issues, timeouts, or server errors. We added retries for transient errors and explicit handling for invalid_grant
:
def self.refresh_credential(credential) retries ||= 0 resp = Faraday.post("#{API_BASE_URL[get_data_region]}/v1#{TOKEN_PATH}") do |req| # ... token refresh logic ... end handle_refresh_response(credential, resp) rescue Faraday::Error => e retry if (retries += 1) < 3 Rails.logger.error("NetDocuments connection failed: #{e.message}") raise end def self.handle_refresh_error(credential, resp) error_body = JSON.parse(resp.body) rescue { error: "unknown" } case error_body["error"] when 'invalid_grant' Rails.logger.error("Invalid grant error: #{error_body}") credential.destroy # Force re-authentication raise StandardError, "Re-authentication required." else Rails.logger.error("Token refresh error: #{error_body}") raise StandardError, "Failed to refresh token." end end
What this does:
- Retries transient Faraday errors up to 3 times.
- Destroys the credential on
invalid_grant
, forcing a re-authentication flow.
Enhance Observability
We added logging to track token refreshes and errors:
Rails.logger.info("Token refreshed at #{Time.now} for NetDocuments.")
This helps debug issues and monitor token lifecycle events.
Leveling Up: Advanced Practices
Once the core logic worked, I added these improvements:
Background Token Refresh
Use a job scheduler (like Sidekiq) to refresh tokens before they expire:
class TokenRefreshJob < ApplicationJob def perform credential = Credential.find_by(kind: 'netdocuments') return unless credential&.expires_at&. < 5.minutes.from_now # Trigger refresh logic here end end
Proactive refreshes reduce latency during API calls.
Circuit Breaker Pattern
Prevent cascading failures with a circuit breaker (using the circuitbox
gem):
def refresh_credential(credential) Circuitbox.circuit(:netdocuments, timeout: 10).run do # Token refresh logic end end
Stops retrying after repeated failures, reducing load on the API.
Admin Alerts
Notify admins when credentials need re-authentication:
def handle_refresh_error(credential, resp) if error_body["error"] == 'invalid_grant' AdminMailer.authentication_required.deliver_later end end
Thread-Safe Token Caching
Cache tokens in memory to reduce database hits:
def self.netdocuments_credential @token_cache ||= {} @token_cache[:netdocuments] ||= begin # Fetch from DB and refresh if needed end end
Final Thoughts
Handling OAuth tokens in a multi-threaded environment is tricky, but manageable with:
- Database locks to prevent race conditions.
- Accurate expiration using the API’s
expires_in
. - Retries & error handling for resilience.
- Proactive refreshes to avoid edge cases.
If I were to start over, I’d design the system with these principles from day one. The key lesson? Never assume tokens are thread-safe always coordinate refreshes.
Gotchas to Watch For:
- Ensure your database supports row-level locks (PostgreSQL/MySQL do).
- Test token expiration logic with realistic values (e.g., 30-minute tokens).
- Monitor logs for
invalid_grant
—it could indicate a compromised token.
By addressing these issues, we reduced invalid_grant
errors to zero.