The mixer contains the following data structures (in mixer.c):
struct {
short *snd_buf; /* array of sound data */
unsigned int len; /* length of data */
unsigned int pos; /* current position ptr */
double stereo_pos; /* stereo (left) */
int filter_flag; /* flag of effect filters to apply */
} EVENT_BUF;
/* A single state sound entry */
typedef struct {
short **snd_buf; /* array of state segments */
unsigned int *len; /* length of segments */
unsigned int snd_cnt; /* number of sounds per state */
unsigned int snd_no; /* current segment to look at */
unsigned int pos; /* position in that segment */
} STATE_SND;
/* Threshold describing the bounds for at which an event applies */
typedef struct {
double l_bound; /* lower bound for the state */
double h_bound; /* upper bound for the state */
STATE_SND state_snd; /* the state sound associated with
* the entry
*/
} THRESHOLD;
/* State buffer descriptor */
typedef struct {
THRESHOLD *thresh; /* threshold of each state segment */
int thresh_cnt; /* number of thresholds */
double stereo_pos; /* stereo (left) */
double vol; /* volume */
int filter_flag; /* flag of effect filters to apply */
} STATE_BUF;
/* Event mixing buffers */
static EVENT_BUF *ebuffs;
static unsigned int no_ebuffs = 0;
/* State mixing buffers */
static STATE_BUF *sbuffs;
static unsigned int no_sbuffs = 0
/* Array of dynamic volumes, alloc'd to no_ebuffs and zero active
* buffer count
*/
static double *dyn_mul;
static unsigned int dyn_buf_cnt = 0;
The voice mixer doesn't just mix, but does all sorts of stuff in addition to mixing. These things include picking the next random segment to play for a given state sound, applying effects to sound in real-time, checking if there are events waiting in the priority queue after a mixed event finishes playing, and calculating event volumes dynamically. The voice mixer also figures out how many channels get allocated to events and states. The default ratio to use is that state mixing channels comprise of 1/4 of the total number of voices. For 16 voices, that means that 12 events can play simultaneously and 4 channels are reserved for state sounds. That ratio can be changed by changing the constant PERCENT_STATE_SOUNDS in main.h. >peepd defaults to 16 voices.
The mixer event data structure is an array of EVENT_BUF records that specify what sound is on the given event channel, what offset the sound is currently at, the sound's stereo location, which real-time effects to apply to the sound, and the length of the sound. Mixing is done by calculating a given sound's stereo location, applying the appropriate real-time effects, and then summing all the sounds together. To avoid clipping, a dynamic mixing algroithm scales the volumes of the sounds. It does this by keeping track of what multipliers it's currently using and the number of voices currently playing. When a new sound comes in, it sums all of the dynamic multipliers (the max possible is 1.0) and determines how much headroom, or space, it can use to calculate the volume multiplier for the new sound. The algorithm used results in the first sounds in a long sequence sound louder than the trailing sounds, but that volumes of all sounds quickly event out with a continuous stream of events. The configure option -with-static-volume doesn't use the dynamic volume algorithm and simply divides all sounds by the number of voices used to avoid any possible clipping. This is a legacy option but could offer some performance gains on really slow machines.
The state data structures are a little more complicated. Individual state sounds are loaded into STATE_SND records and are kept in the mixer. The mixer keeps track of which sound segment is playing, it's stereo location, the real-time effects to apply, and the number of sound segements associated with the event. To form a continuous stream of output, the mixer streams the segments together, choosing the next segment at random. Individual state sound records are stored in a threshold data structure. The threshold record tells the mixer which state record to select based on the volume of the state. This allows, for example, scaling a stream to a waterfall sound to better represent increasing activity. The engine modifies the volume and stereo location parts of the data structure directly and the mixer the actual volume and stereo location of the sound are calculated dynamically while mixing.
During the mixing process, real-time effects are applied to the sound data, depending on which flags are set in the filter_flag member variable. The effects architecture is outlined in detail in a subsequent section.
Some other stuff to note about the mixer is that it aims to feed enough sound data into the audio buffers on the sound card so that it can sleep for a bit before continuing calculations. Here, there was a tradeoff between response time and performance. The ''chunk size'' used in this case is roughly a 1/2 second of sound. The mixer sleeps for about half of that and then begins calculating again. The idea is to always keep the audio buffer on the sound card while maximizing sleep time. The current compromise works alright but could be tweaked...
Finally, the voice mixer interfaces with actual sound devices through an abstraction layer, which is introduced in a subsequent section.