View Source Nostrum.Api.Ratelimiter (Nostrum v0.10.0)
Handles REST calls to the Discord API while respecting ratelimits.
Purpose
Discord's API returns information about ratelimits that we must respect. This module performs serialization of these requests through a single process, thus preventing concurrency issues from arising if two processes make a remote API call at the same time.
Internal module
This module is intended for exclusive usage inside of nostrum, and is documented for completeness and people curious to look behind the covers.
Asynchronous requests
The ratelimiter is fully asynchronous internally. In theory, it also supports
queueing requests in an asynchronous manner. However, support for this is
currently not implemented in Nostrum.Api
.
If you want to make one or multiple asynchronous requests manually, you can use the following pattern:
req = :gen_statem.send_request(Nostrum.Api.Ratelimiter, {:queue, request})
# ...
response = :gen_statem.receive_response(req, timeout)
where request
is a map describing the request to run - see Nostrum.Api
for more information. You can also send multiple requests at the same time
and wait for their response: see :gen_statem.reqids_add/3
and
:gen_statem.wait_response/3
for more information.
Multi-node
nostrum will transparently distribute client requests across all ratelimiter clusters running in the cluster. This allows us to account for per-route ratelimits whilst still distributing work across cluster nodes. Note that the API enforces a global user ratelimit across all requests, which we cannot account for using this method.
Inner workings
When a client process wants to perform some request on the Discord API, it
sends a request to the :gen_statem
behind this module to ask it to :queue
the incoming request.
Connection setup
If the state machine is not connected to the HTTP endpoint, it will
transition to the :connecting
state and try to open the connection. If this
succeeds, it transitions to the :connected
state.
Queueing requests
The state machine associates a :queue.queue/1
of queued_request/0
to
each individual bucket, together with an internal count of remaining calls.
When queueing requests, the following cases occur:
If there are no remaining calls in the bot's global ratelimit bucket or there are no remaining calls in the bucket, the request is put into the bucket's queue.
If there is an
:initial
running request to the bucket, the request is put into the bucket's queue.If there are more than 0 remaining calls on both the request-specific bucket and the global bucket, the request is started right away. This allows nostrum to dispatch multiple requests to the same endpoint as soon as possible as long as calls remain.
If no ratelimit information is known for the bucket and remaining calls on the global bucket, the request is sent out as the "pioneer" request that will retrieve how many calls we have for this bucket (
:initial
, see above).If none of the above is true, a new queue is created and the pending rqeuest marked as the
:initial
request. It will be run as soon as the bot's global limit limit expires.
The request starting function, :next
, will start new requests from the
queue as long as more calls are possible in the timeframe. Any requests are
then started asynchronously. Bookkeeping is set up to associate the resulting
:gun.stream_ref/0
with the original client along with its request and the
ratelimiter bucket.
Results from the HTTP connection are delivered non-blocking: simple responses
with purely status codes and no body (code 204
) will be sent in a single
message, other requests will be sent to us incrementally. To finally deliver
the full response body to the client with the final package, an internal
buffer of the body is kept. A possible future optimization could be having a
way for :gun
to only send the ratelimiter state machine the initial
:gun_response
and forward any item of the body directly to the client.
When the headers for a request have been received, the ratelimiter parses the
ratelimit information and starts off an internal timer expiring when the
ratelimits expire. It will also reschedule calls with the :next
internal
event for as many remaining calls as it knows about. Once the timer expires
for the current bucket, two cases can happen:
The queue has items: Schedule all items and repeat this later.
The queue is empty: Delete the queue and remaining calls from the outstanding buckets.
In practice, this means that we never store more information than we need, and removes the previous regular bucket sweeping functionality that the ratelimit buckets required.
Global ratelimits (note this is a distinct ratelimit from the bot's
"global", per-user ratelimit) are handled with the special global_limit
state. This state is entered for exactly the the X-Ratelimit-Reset-After
time provided in the global ratelimit response. This state does nothing apart
from postponing any events it receives and returning to the previous state
(:connected
) once the global timeout is gone. Requests that failed because
of the global ratelimit are requeued after returning back into the regular
state: a warning is logged to inform you of this.
Failure modes
HTTP connection death
If the HTTP connection dies, the ratelimiter will inform each affected client
by replying with {:error, {:connection_died, reason}}
, where reason
is
the reason as provided by the :gun_down
event. It will then transition to
:disconnected
state. If no requests were running at time the connection was
shut down - for instance, because we simply reached the maximum idle time on
the HTTP/2 connection - we will simply move on.
Upstream errors
The ratelimiter works by queueing requests aggressively as soon as it has ratelimit information to do so. If no ratelimit information is available, for instance, because Discord returned us a 502 status code, the ratelimiter will not automatically kick the queue to start further running requests.
Other internal issues
Any other internal problems that are not handled appropriately in the ratelimiter will crash it, effectively resulting in the complete loss of any queued requests.
Implementation benefits & drawbacks
A history of ratelimiting
First, it is important to give a short history of nostrum's ratelimiting: pre
0.8
, nostrum used to use a GenServer
that would call out to ETS tables to
look up ratelimiting buckets for requests. If it needed to sleep before
issuing a request due to the bucket being exhausted, it would do so in the
server process and block other callers.
In nostrum 0.8, the existing ratelimiter bucket storage architecture was
refactored to be based around the pluggable caching
functionality, and buckets with no
remaining calls were adjusted to be slept out on the client-side by having
the GenServer
respond to the client with {:error, {:retry_after, millis}}
and the client trying again and again to schedule its requests. This allowed
users to distribute their ratelimit buckets around however they wish, out of
the box, nostrum shipped with an ETS and a Mnesia-based ratelimit bucket
store.
Problems we solved
The approach above still came with a few problems:
Requests were still being done synchronously in the ratelimiter, and it was blocked from anything else whilst running the requests, even though we are theoretically free to start requests for other buckets while one is still running.
The ratelimiter itself was half working on its own, but half required the external storage mechanisms, which made the code hard to follow and required regular automatic pruning because the store had no idea when a bucket was no longer relevant on its own.
Requests would not be pipelined to run as soon as ideally possible.
The ratelimiter did not inform clients if their request died in-flight.
If the client disconnected before we returned the response, we had to handle this explicitly via
handle_info
.
The new state machine-based ratelimiter solves these problems.
Summary
Types
A bucket for endpoints unter the same ratelimit.
A bucket-specific request waiting to be queued, alongside its client.
Remaining calls on a route, as provided by the API response.
A request to make in the ratelimiter.
The state of the ratelimiter.
Functions
Callback implementation for :gen_statem.callback_mode/0
.
Callback implementation for :gen_statem.code_change/4
.
Retrieves a proper ratelimit endpoint from a given route and url.
Callback implementation for :gen_statem.init/1
.
Queue the given request and wait for the response synchronously.
Starts the ratelimiter.
Types
@type bucket() :: String.t()
A bucket for endpoints unter the same ratelimit.
@type queued_request() :: {request(), client :: :gen_statem.from()}
A bucket-specific request waiting to be queued, alongside its client.
@type remaining() :: non_neg_integer() | :initial
Remaining calls on a route, as provided by the API response.
The ratelimiter internally counts the remaining calls per route to dispatch new requests as soon as it's capable of doing so, but this is only possible if the API already provided us with ratelimit information for an endpoint.
Therefore, if the initial call on an endpoint is made, the special :initial
value is specified. This is used by the limit parsing function to set the
remaining calls if and only if it is the response for the initial call -
otherwise, the value won't represent the truth anymore.
@type request() :: %{ method: :get | :post | :put | :delete, route: String.t(), body: iodata(), headers: [{String.t(), String.t()}], params: Enum.t() }
A request to make in the ratelimiter.
@type state() :: %{ outstanding: %{ required(bucket()) => {remaining(), :queue.queue(queued_request())} }, running: %{ required(:gun.stream_ref()) => {bucket(), request(), :gen_statem.from()} }, inflight: %{ required(:gun.stream_ref()) => {status :: non_neg_integer(), headers :: [{String.t(), String.t()}], body :: String.t()} }, conn: pid() | nil, remaining_in_window: non_neg_integer(), wrapped_token: Nostrum.Api.Base.wrapped_token() }
The state of the ratelimiter.
While this has no public use, it is still documented here to provide help
when tracing the ratelimiter via :sys.trace/2
or other means.
Fields
:outstanding
: Outstanding (unqueued) requests per bucket alongside with the remaining calls that may be made on said bucket.:running
: Requests that have been sent off. Used to associate back the client with a request when the response comes in.:inflight
: Requests for which we have started getting a response, but we have not fully received it yet. For responses that have a body, this will buffer their body until we can send it back to the client.:conn
: The:gun
connection backing the server. Used for making new requests, and updated as the state changes.:remaining_in_window
: How many calls we may still make to the API during this time window. Reset automatically via timeouts.:wrapped_token
: An anonymous function that is internally used to retrieve the token. This is wrapped to ensure that it is not accidentally exposed in stacktraces.
Functions
Callback implementation for :gen_statem.callback_mode/0
.
Callback implementation for :gen_statem.code_change/4
.
Retrieves a proper ratelimit endpoint from a given route and url.
Callback implementation for :gen_statem.init/1
.
Queue the given request and wait for the response synchronously.
Ratelimits on the endpoint are handled by the ratelimiter. Global ratelimits will cause this to return an error.
@spec start_link({String.t(), [:gen_statem.start_opt()]}) :: :gen_statem.start_ret()
Starts the ratelimiter.