Audio compression
Review
Pete Morris has shown you how to write a program which can compress and de-compress sound data using a "codec". A codec is a "coder-decoder" driver used by the Audio Compression Manager to reduce the size of a recorded file. There are many different compression formats and on your PC are a number of codecs. Start | Settings | Control Panel | Multimedia / Devices gives you a list but does not tell you much about them.
How do you find out what formats (channels, sampling rate, etc etc) they support. Getting this information is a journey down some "twisty little passages" (if you remember the early dungeons and dragons) but hold my hand while I take you through, and describe how a viewer for the ACMDrivers is built. The whole program code is attached for you to follow through and to enable you to compile it - I built it with Delphi 3 but have included the form as a text file if you need it for a later version.
The format definition used to define the audio format consists of the data stored in the elements of the WAVEFORMATEX structure, implemented by Delphi as TWaveFormatEx as follows :-
TWaveFormatEx = packed record
wFormatTag: Word; { format tag }
nChannels: Word; { number of channels ( mono, stereo, etc.) }
nSamplesPerSec: DWORD; { sample rate }
nAvgBytesPerSec: DWORD; { for buffer estimation }
nBlockAlign: Word; { block size of data }
wBitsPerSample: Word; { number of bits per sample of mono data }
cbSize: Word; { the count of the number of extra bytes }
end;
The FormatTag is the particular type of format such as PCM (the standard no-compression conversion into digital values), GSM 610, and MP3. These have Microsoft constants declared as WAVE_FORMAT_PCM ($1), WAVE_FORMAT_GSM610 ($31) and WAVE_FORMAT_ MPEGLAYER3 ($55) respectively. For each Format Tag there will usually be number of variations of channels (mono or stereo), sampling rate (from 8000Hz to 44100 Hz, or quoted as BitsPerSec) and BitsPerSample (8, 16, or maybe 0 for a complex compression algorithm in which the bits / sample are really meaningless).
There are also some "Extra Bytes" added at the end of this structure which vary (in true C style) from format to format and are used for additional format information. cbSize gives the number of these extra bytes, for which memory must be allocated when you are enumerating the codecs.
Each codec must support conversion between a PCM format tag and its own particular format tag, but not necessarily covering all the options of channels, bits / sample, and sampling rate in the standard PCM. Microsoft supply a converter (not a codec) which converts from one PCM format to another. But it is only a conversion of the format, you get no more quality for converting from a 8000Hz Mono 8 bits / sample to a 44100Hz Stereo, 16 bits / sample.
In order to display the details of the codecs Microsoft provides ACM (Audio Compression Manager) enumerating functions for enumerating drivers (ie codecs) and their formats. When a driver is found following a call to one of these functions, the ACM calls a callback function in your program. This is really the same as an event in a normal Delphi object class, except that it is a function and not a procedure. You write some code for the function (which has a specified list of parameters just like a Delphi event handler), and when the event occurs (ACM has found a driver) then it calls the function whose address you have supplied. The parameters of the function are pointers or handles to the driver it has found, or a pointer to a list of details about the driver it has found. If you were looking for just one particular driver, then when you recognised it as the one ACM had found, you would return false as the function result. This would cause ACM to stop enumerating. Of course as we want a list of all drivers we will only return true.
The ACM callback function is only a general procedure not a method. In order to call a method we would need a reference to an instantiated object whose method would be actioned on callback. We can implement this by passing a reference to our object when we ask for drivers to be enumerated. Windows passes this object reference back for us to use to call a method. This technique is not necessary for this program but it is a nice item to bring windows callbacks within Delphi objects.
Once ACM has found a driver and returned a reference to it to us, we must enumerate the formats it will handle. A similar enumerating call and callback will cause ACM to enumerate though the formats of that driver, and when it finds a format it calls our callback function with details of the format it has found (including its format tag). From these details coming back, we construct a treeview which displays all the details for us.
One of the useful items of information which is needed is a count of the extra bytes needed at the end of a TWaveFormatEx structure to define the particular format we are converting. An acmMetrics() function will obtain this value, either as a system wide maximum, or as a driver maximum. The value is obtained and added to the displayed data.
The program first of all gets general information on the Audio Compression Manager which is installed on your PC (see ACMInstalled()) by calling acmGetVersion to return the version number dword which we unpack and interpret. Then we call acmMetrics with a flag to return the number of codecs installed. This information is displayed above the treeview.
Application.ProcessMesages is called to display the form while we enumerate for the information to be displayed in a treeview on the form.
Then we call acmDriverEnum() passing the address of our callback function, a reference to the form object (ACM passes this back to us unchanged in the callback and we use it to call a form method) and a flag indicating that we want all drivers even if they are disabled.
When ACM finds a driver and calls back it passes back to us (in DriverEnumProc()) a handle to the DriverId, the reference to our form we passed it, and a value which holds flags indicating what the driver supports. We strip off the form reference, typecast it to the form type and call a method (ACMEnumDrvrMethod()) of the form with parameters of the DriverId and the support flags.
We extract some details of the driver from the supports flags and insert this information in the treeview. Then we open the driver with the DriverId to get a handle to the driver itself. As we have to provide a TWaveFormatEx structure which is big enough to hold any extra bytes (even though we don't use that information) we call acmMetrics to get the largest size for the driver. Then we fill a TACMFormatDetails structure and call acmFormatEnum to enumerate the formats, again passing a reference to the form, the driver handle, the ACMFormatDetails, and the address of our callback function.
When we receive a format enumeration callback we pass it to a method of the form (as we did with the driver enumeration callback) and extract details of the format and display them in the treeview.
Because of the many elements of the treeview which we display I have put an indicator of the treeview level, at the side of the code creating the treeview nodes, to help you.
By Alan G Lloyd - AlanGLLoyd@aol.com |