View Source Nostrum.Voice (Nostrum v0.10.0)
Interface for playing and listening to audio through Discord's voice channels.
Using Discord Voice Channels
To play sound in Discord with Nostrum, you'll need ffmpeg
to be installed.
If you don't have the executable ffmpeg
in the path, the absolute path may
be configured through config keys :nostrum, :ffmpeg
. If you don't want to use
ffmpeg, read on to the next section.
A bot may be connected to at most one voice channel per guild. For this reason, most of the functions in this module take a guild id, and the resulting action will be performed in the given guild's voice channel that the bot is connected to.
The primary Discord gateway responsible for all text based communication relies on one websocket connection per shard, where small bots typically only have one shard. The Discord voice gateways work by establishing a websocket connection per guild/channel. After some handshaking on this connection, audio data can be sent over UDP/RTP. Behind the scenes the voice websocket connections are implemented nearly the same way the main shard websocket connections are, and require no developer intervention.
In addition to playing audio, listening to incoming audio is supported through the
functions listen/3
and start_listen_async/1
.
Voice Without FFmpeg
If you wish to BYOE (Bring Your Own Encoder), there are a few options.
- Use
:raw
astype
forplay/4
- Provide the complete list of opus frames as the input
- Use
:raw_s
astype
forplay/4
- Provide a stateful enumerable of opus frames as input (think GenServer wrapped in
Stream.unfold/2
)
- Provide a stateful enumerable of opus frames as input (think GenServer wrapped in
- Use lower level functions to send opus frames at your leisure
- Send packets on your own time using
send_frames/2
- Send packets on your own time using
Summary
Types
Opus packet
The play input
The type of play input
Tuple with RTP header elements and opus packet
RTP sequence
RTP SSRC
RTP timestamp
Functions
Returns a specification to start this module under a supervisor.
Low-level. Manually connect to voice websockets gateway.
Create a complete Ogg logical bitstream from a list of Opus packets.
Extract the opus packet from the RTP packet received from Discord.
Gets the id of the voice channel that the bot is connected to.
Gets the current URL being played.
Gets a map of RTP SSRC to user id.
Joins or moves the bot to a voice channel.
Leaves the voice channel of the given guild id.
Listen for incoming voice RTP packets.
Pad discontinuous chunks of opus audio with silence.
Pauses the current sound being played in a voice channel.
Plays sound in the voice channel the bot is in.
Checks if the bot is playing sound in a voice channel.
Checks if the connection is up and ready to play audio.
Resumes playing the current paused sound in a voice channel.
Low-level. Send pre-encoded audio packets directly.
Low-level. Set speaking flag in voice channel.
Start asynchronously receiving events for incoming RTP packets for an active voice session.
Stops the current sound being played in a voice channel.
Stop asynchronously receiving events for incoming RTP packets for an active voice session.
Types
@type opus_packet() :: binary()
Opus packet
The play input
The input given to play/4
, either a compatible URL or binary audio data.
See play/4
for more information.
@type play_type() :: :url | :pipe | :ytdl | :stream | :raw | :raw_s
The type of play input
The type given to play/4
determines how the input parameter is interpreted.
See play/4
for more information.
@type rtp_opus() :: {{rtp_sequence(), rtp_timestamp(), rtp_ssrc()}, opus_packet()}
Tuple with RTP header elements and opus packet
@type rtp_sequence() :: non_neg_integer()
RTP sequence
@type rtp_ssrc() :: non_neg_integer()
RTP SSRC
@type rtp_timestamp() :: non_neg_integer()
RTP timestamp
Functions
Returns a specification to start this module under a supervisor.
See Supervisor
.
@spec connect_to_gateway(Nostrum.Struct.Guild.id()) :: :ok | {:error, String.t()}
Low-level. Manually connect to voice websockets gateway.
This function should only be called if config option :voice_auto_connect
is set to false
.
By default Nostrum will automatically create a voice gateway when joining a channel.
@spec create_ogg_bitstream([opus_packet()]) :: [binary()]
Create a complete Ogg logical bitstream from a list of Opus packets.
This function takes a list of opus packets and returns a list of Ogg encapsulated Opus pages for a single Ogg logical bitstream.
It is highly recommended to learn about the Ogg container format to understand how to use the data.
To get started, assuming you have a list of evenly temporally spaced and consecutive opus packets from a single source that you want written to a file, you can run the following:
bitstream =
opus_packets
|> create_ogg_bitstream()
|> :binary.list_to_bin()
File.write!("my_recording.ogg", bitstream)
When creating a logical bitstream, ensure that the packets are all from a single SSRC. When listening in a channel with multiple speakers, you should be storing the received packets in unique buckets for each SSRC so that the multiple audio sources don't become jumbled. A single logical bitstream should represent audio data from a single speaker. An Ogg physical bitstream (e.g. a file) may be composed of multiple interleaved Ogg logical bitstreams as each logical bitstream and its constituent pages contain a unique and randomly generated bitstream serial number, but this is a story for another time.
Assuming you have a list of rtp_opus/0
packets that are not separated by ssrc, you
may do the following:
jumbled_packets
|> Stream.filter(fn {{_seq, _time, ssrc}, _opus} -> ssrc == particular_ssrc end)
|> Enum.map(fn {{_seq, _time, _ssrc}, opus} -> opus end)
|> create_ogg_bitstream()
@spec extract_opus_packet(binary()) :: opus_packet()
Extract the opus packet from the RTP packet received from Discord.
Incoming voice RTP packets contain a fixed length RTP header and an optional RTP header extension, which must be stripped to retrieve the underlying opus packet.
@spec get_channel_id(Nostrum.Struct.Guild.id()) :: Nostrum.Struct.Channel.id()
Gets the id of the voice channel that the bot is connected to.
Parameters
guild_id
- ID of guild that the resultant channel belongs to.
Returns the channel_id
for the channel the bot is connected to, otherwise nil
.
Examples
iex> Nostrum.Voice.join_channel(123456789, 420691337)
iex> Nostrum.Voice.get_channel(123456789)
420691337
iex> Nostrum.Voice.leave_channel(123456789)
iex> Nostrum.Voice.get_channel(123456789)
nil
@spec get_current_url(Nostrum.Struct.Guild.id()) :: String.t() | nil
Gets the current URL being played.
If play/4
was invoked with type :url
, :ytdl
, or :stream
, this function will return
the URL given as input last time it was called.
If play/4
was invoked with type :pipe
, :raw
, or :raw_s
, this will return nil
as the input is raw audio data, not be a readable URL string.
@spec get_ssrc_map(Nostrum.Struct.Guild.id()) :: Nostrum.Struct.VoiceWSState.ssrc_map()
Gets a map of RTP SSRC to user id.
Within a voice channel, an SSRC (synchronization source) will uniquely map to a user id of a user who is speaking.
If listening to incoming voice packets asynchronously, this function will not be
needed as the Nostrum.Struct.VoiceWSState.ssrc_map/0
will be available with every event.
If listening with listen/3
, this function may be used. It is recommended to
cache the result of this function and only call it again when you encounter an
SSRC that is not present in the cached result. This is to reduce excess load on the
voice websocket and voice state processes.
join_channel(guild_id, channel_id, self_mute \\ false, self_deaf \\ false, persist \\ true)
View Source@spec join_channel( Nostrum.Struct.Guild.id(), Nostrum.Struct.Channel.id(), boolean(), boolean(), boolean() ) :: no_return() | :ok
Joins or moves the bot to a voice channel.
This function calls Nostrum.Api.update_voice_state/4
.
The fifth argument persist
defaults to true
. When true, if calling join_channel/5
while already in a different channel in the same guild, the audio source will be persisted
in the new channel. If the audio is actively playing at the time of changing channels,
it will resume playing automatically upon joining. If there is an active audio source
that has been paused before changing channels, the audio will be able to be resumed manually if
resume/1
is called.
If persist
is set to false, the audio source will be destroyed before changing channels.
The same effect is achieved by calling stop/1
or leave_channel/1
before join_channel/5
@spec leave_channel(Nostrum.Struct.Guild.id()) :: no_return() | :ok
Leaves the voice channel of the given guild id.
This function is equivalent to calling Nostrum.Api.update_voice_state(guild_id, nil)
.
@spec listen(Nostrum.Struct.Guild.id(), pos_integer(), raw_rtp :: false) :: [rtp_opus()] | {:error, String.t()}
@spec listen(Nostrum.Struct.Guild.id(), pos_integer(), raw_rtp :: true) :: [binary()] | {:error, String.t()}
Listen for incoming voice RTP packets.
Parameters
guild_id
- ID of guild that the bot is listening to.num_packets
- Number of packets to wait for.raw_rtp
- Whether to return raw RTP packets. Defaults tofalse
.
Returns a list of tuples of type rtp_opus/0
.
The inner tuple contains fields from the RTP header and can be matched against to retrieve information about the packet such as the SSRC, which identifies the source. Note that RTP timestamps are completely unrelated to Unix timestamps.
If raw_rtp
is set to true
, a list of raw RTP packets is returned instead.
To extract an opus packet from an RTP packet, see extract_opus_packet/1
.
This function will block until the specified number of packets is received.
@spec pad_opus([rtp_opus(), ...]) :: [opus_packet()]
Pad discontinuous chunks of opus audio with silence.
This function takes a list of rtp_opus/0
, which is a tuple containing RTP bits and
opus audio data. It returns a list of opus audio packets. The reason the input has to be in
the rtp_opus/0
tuple format returned by listen/3
and async listen events is that the
RTP packet header contains info on the relative timestamps of incoming packets; the opus
packets themselves don't contain information relating to timing.
The Discord client will continue to internally increment the t:rtp_timestamp()
when the
user is not speaking such that the duration of pauses can be determined from the RTP packets.
Bots will typically not behave this way, so if you call this function on audio produced by
a bot it is very likely that no silence will be inserted.
The use case of this function is as follows: Consider a user speaks for two seconds, pauses for ten seconds, then speaks for another two seconds. During the pause, no RTP packets will be received, so if you create a bitstream from it, the resulting audio will be both two-second speaking segments consecutively without the long pause in the middle. If you wish to preserve the timing of the speaking and include the pause, calling this function will interleave the appropriate amount of opus silence packets to maintain temporal fidelity.
Note that the Discord client currently sends about 10 silence packets (200 ms) each time it detects end of speech, so creating a bitstream without first padding your audio with this function will maintain short silences between speech segments.
This function should only be called on a collection of RTP packets from a single SSRC
@spec pause(Nostrum.Struct.Guild.id()) :: :ok | {:error, String.t()}
Pauses the current sound being played in a voice channel.
The bot must be connected to a voice channel in the guild specified.
Parameters
guild_id
- ID of guild whose voice channel the sound will be paused in.
Returns {:error, reason}
if unable to pause or no sound is playing, else :ok
.
This function is similar to stop/1
, except that the sound may be
resumed after being paused.
Examples
iex> Nostrum.Voice.join_channel(123456789, 420691337)
iex> Nostrum.Voice.play(123456789, "~/files/twelve_hour_loop_of_waterfall_sounds.mp3")
iex> Nostrum.Voice.pause(123456789)
@spec play(Nostrum.Struct.Guild.id(), play_input(), play_type(), keyword()) :: :ok | {:error, String.t()}
Plays sound in the voice channel the bot is in.
The bot must be connected to a voice channel in the guild specified.
Parameters
guild_id
- ID of guild whose voice channel the sound will be played in.input
- Audio to be played,play_input/0
. Input type determined bytype
parameter.type
- Type of input,play_type/0
(defaults to:url
).:url
Input will be any url thatffmpeg
can read.:pipe
Input will be data that is piped to stdin offfmpeg
.:ytdl
Input will be url foryoutube-dl
, which gets automatically piped toffmpeg
.:stream
Input will be livestream url forstreamlink
, which gets automatically piped toffmpeg
.:raw
Input will be an enumerable of raw opus packets. This bypassesffmpeg
and all options.:raw_s
Same as:raw
but input must be stateful, i.e. callingEnum.take/2
oninput
is not idempotent.
options
- See options section below.
Returns {:error, reason}
if unable to play or a sound is playing, else :ok
.
Options
:start_pos
(string) - The start position of the audio to be played. Defaults to beginning.:duration
(string) - The duration to of the audio to be played. Defaults to entire duration.:realtime
(boolean) - Make ffmpeg process the input in realtime instead of as fast as possible. Defaults to true.:volume
(number) - The output volume of the audio. Default volume is 1.0.:filter
(string) - Filter(s) to be applied to the audio. No filters applied by default.
The values of :start_pos
and :duration
can be any time duration that ffmpeg can read.
The :filter
can be used multiple times in a single call (see examples).
The values of :filter
can be any audio filters that ffmpeg can read.
Filters will be applied in order and can be as complex as you want. The world is your oyster!
Note that using the :volume
option is shortcut for the "volume" filter, and will be added to the end of the filter chain, acting as a master volume.
Volume values between 0.0
and 1.0
act as standard operating range where 0
is off and 1
is max.
Values greater than 1.0
will add saturation and distortion to the audio.
Negative values act the same as their position but reverse the polarity of the waveform.
Having all the ffmpeg audio filters available is extremely powerful so it may be worth learning some of them for your use cases.
If you use any filters to increase the playback speed of your audio, it's recommended to set the :realtime
option to false
because realtime processing is relative to the original playback speed.
Examples
iex> Nostrum.Voice.join_channel(123456789, 420691337)
iex> Nostrum.Voice.play(123456789, "~/music/FavoriteSong.mp3", :url)
iex> Nostrum.Voice.play(123456789, "~/music/NotFavoriteButStillGoodSong.mp3", :url, volume: 0.5)
iex> Nostrum.Voice.play(123456789, "~/music/ThisWillBeHeavilyDistorted.mp3", :url, volume: 1000)
iex> Nostrum.Voice.join_channel(123456789, 420691337)
iex> raw_data = File.read!("~/music/sound_effect.wav")
iex> Nostrum.Voice.play(123456789, raw_data, :pipe)
iex> Nostrum.Voice.join_channel(123456789, 420691337)
iex> Nostrum.Voice.play(123456789, "https://www.youtube.com/watch?v=b4RJ-QGOtw4", :ytdl,
...> realtime: true, start_pos: "0:17", duration: "30")
iex> Nostrum.Voice.play(123456789, "https://www.youtube.com/watch?v=0ngcL_5ekXo", :ytdl,
...> filter: "lowpass=f=1200", filter: "highpass=f=300", filter: "asetrate=44100*0.5")
iex> Nostrum.Voice.join_channel(123456789, 420691337)
iex> Nostrum.Voice.play(123456789, "https://www.twitch.tv/pestily", :stream)
iex> Nostrum.Voice.play(123456789, "https://youtu.be/LN4r-K8ZP5Q", :stream)
@spec playing?(Nostrum.Struct.Guild.id()) :: boolean()
Checks if the bot is playing sound in a voice channel.
Parameters
guild_id
- ID of guild to check if audio being played.
Returns true
if the bot is currently being played in a voice channel, otherwise false
.
Examples
iex> Nostrum.Voice.join_channel(123456789, 420691337)
iex> Nostrum.Voice.play(123456789, "https://a-real-site.biz/RickRoll.m4a")
iex> Nostrum.Voice.playing?(123456789)
true
iex> Nostrum.Voice.pause(123456789)
iex> Nostrum.Voice.playing?(123456789)
false
@spec ready?(Nostrum.Struct.Guild.id()) :: boolean()
Checks if the connection is up and ready to play audio.
Parameters
guild_id
- ID of guild to check if voice connection is up.
Returns true
if the bot is connected to a voice channel, otherwise false
.
This function does not check if audio is already playing. For that, use playing?/1
.
Examples
iex> Nostrum.Voice.join_channel(123456789, 420691337)
iex> Nostrum.Voice.ready?(123456789)
true
iex> Nostrum.Voice.leave_channel(123456789)
iex> Nostrum.Voice.ready?(123456789)
false
@spec resume(Nostrum.Struct.Guild.id()) :: :ok | {:error, String.t()}
Resumes playing the current paused sound in a voice channel.
The bot must be connected to a voice channel in the guild specified.
Parameters
guild_id
- ID of guild whose voice channel the sound will be resumed in.
Returns {:error, reason}
if unable to resume or no sound has been paused, otherwise returns :ok
.
This function is used to resume a sound that had previously been paused.
Examples
iex> Nostrum.Voice.join_channel(123456789, 420691337)
iex> Nostrum.Voice.play(123456789, "~/stuff/Toto - Africa (Bass Boosted)")
iex> Nostrum.Voice.pause(123456789)
iex> Nostrum.Voice.resume(123456789)
@spec send_frames(Nostrum.Struct.Guild.id(), [opus_packet()]) :: :ok | {:error, String.t()}
Low-level. Send pre-encoded audio packets directly.
Speaking should be set to true via Nostrum.Voice.set_is_speaking/2
before sending frames.
Opus frames will be encrypted and prefixed with the appropriate RTP header and sent immediately.
The length of frames
depends on how often you wish to send a sequence of frames.
A single frame contains 20ms of audio. Sending more than 50 frames (1 second of audio)
in a single function call may result in inconsistent playback rates.
Nostrum.Voice.playing?/1
will not return accurate values when using send_frames/2
instead of Nostrum.Voice.play/4
@spec set_is_speaking(Nostrum.Struct.Guild.id(), boolean()) :: :ok
Low-level. Set speaking flag in voice channel.
This function does not need to be called unless you are sending audio frames
directly using Nostrum.Voice.send_frames/2
.
@spec start_listen_async(Nostrum.Struct.Guild.id()) :: :ok | {:error, term()}
Start asynchronously receiving events for incoming RTP packets for an active voice session.
This is an alternative to the blocking listen/3
. Events will be generated asynchronously
when a user is speaking. See Nostrum.Consumer.voice_incoming_packet/0
for more info.
@spec stop(Nostrum.Struct.Guild.id()) :: :ok | {:error, String.t()}
Stops the current sound being played in a voice channel.
The bot must be connected to a voice channel in the guild specified.
Parameters
guild_id
- ID of guild whose voice channel the sound will be stopped in.
Returns {:error, reason}
if unable to stop or no sound is playing, else :ok
.
If a sound has finished playing, this function does not need to be called to start playing another sound.
Examples
iex> Nostrum.Voice.join_channel(123456789, 420691337)
iex> Nostrum.Voice.play(123456789, "http://brandthill.com/files/weird_dubstep_noises.mp3")
iex> Nostrum.Voice.stop(123456789)
@spec stop_listen_async(Nostrum.Struct.Guild.id()) :: :ok | {:error, term()}
Stop asynchronously receiving events for incoming RTP packets for an active voice session.