EDIT 12/03/2022:
Final Audio CD offset correction algorithm (technical):
1. if there is one and only one possible disc write offset, applying which perfectly aligns audio silence (level: 0) ranges with TOC index 0 ranges, use it
2. else if there is non-zero data in lead-out and that data can be fully shifted out (left) without spanning non-zero data into lead-in, correct offset with a minimum shift required
3. else if there is non-zero data in lead-in and that data can be fully shifted out (right) without spanning non-zero data into lead-out, correct offset with a minimum shift required
4. else apply offset 0
Dump format specification notes:
Regardless of the applied offset, we always operate on LBA [ 0 .. lead-out ) range when we perform a track split. This is also true for data discs. If, as a result of mastering, there is non-zero data either in lead-in ( -inf .. 0 ) or lead-out [ lead-out .. +inf ), this is preserved in separate files. For discs with data tracks, non-zero data means the descrambled data portion of a sector (fields such as sync, MSF, mode, subheader, ECC/EDC is excluded). The resulting lead-in/lead-out files should be trimmed to the minimum non-zero data size, lead-in from the front and lead-out from the back, but they should remain sector-aligned (size divisible by 2352).
Considerations for matching Audio CD with different write offsets
For each disc, dumping software can generate a checksum/hash for the non-zero sample span of the data e.g. aligned to 4-byte CD sample size. Such hash can be used for the disc identification purpose as well as for Audio CD different write offset matching as for such cases the hash would be the same. This applies to both audio and data discs. I suggest to use a longer SHA-1 hash and not CRC32 just to be future proof. It's quite likely that we will get a couple of CRC32 collisions in ~100K discs currently in the DB. As a side perk, it can also serve as a unique disc ID which can be easily looked up in the database, if such capability will be ever implemented at redump.org. In any way, I would like to have such a hash in Audio CD entries just for write offset matching.
END OF EDIT, the rest information here is kept for the reference and archival purpose.
This topic is to clarify and decide on how we manage disc write offset for audio discs.
The current status quo is that we always dump audio discs with offset 0 as there is no reliable reference point in the audio stream (in contrary to a data track) which can be used to determine the offset. This approach has a number of disadvantages, such as:
* shifted audio data in pre-gap / lead-out which we don't currently preserve
* ocassional imperfect track split which cuts in the middle of audio tracks e.g. you hear a bit of the next track in the end of a current one or previous track in the beginning of the current one
I believe I solved both of these problems in redumper. Let me define some terminology first.
Perfect Audio Offset is a disc write offset which guarantees that no data is shifted into lead-out and guarantees track split which doesn't cut into a middle of a track.
Perfect Audio Offset implementation details
For a given audio disc, I build a silence map based on TOC/subchannel information, essentially it's INDEX 00 CUE entries which are almost always empty (silent). As a next step, I build a similar silence map based on an audio stream. Finally, For each offset within [-150 .. lead-out] constrained range, I try to line up these two maps in a way that TOC/subchannel based one fits into an audio stream. If it fits, it's a perfect audio offset.
The current audio offset logic
favor offset value from perfect offset range if available
if multiple perfect offset values available (range), try to shift data out of pre-gap (if needed) if still within a perfect range
otherwise if no data in pre-gap, favor offset 0 if it belongs to the perfect range
finally, if offset 0 doesn't belong to the perfect range, use a value closest to 0 within a perfect range
if no any perfect offset available, try to shift data out of lead-out and pre-gap (only if we can get rid of full pre-gap data) if that doesn't lead to data loss.
or pseudocode:
if(perfect_audio_offset_available)
{
if(perfect_offset_single_value)
use_offset_value();
else if(perfect_offset_value_range)
{
if(data_in_pregap)
{
if(enough_space_to_get_rid_of_whole_data_in_pregap AND still_within_perfect_range)
move_minimum_data_right();
}
else if(zero_offset_belongs_to_perfect_range)
use_zero_offset();
else
choose_the_offset_closest_to_zero();
}
}
else
{
if(data_in_leadout)
move_minimum_data_left();
else if(data_in_pregap AND enough_space_to_get_rid_of_whole_data_in_pregap)
move_minimum_data_right();
}
Pre-Gap notes
Based from the discs I own and discs I have an access to, most audio discs with data in pre-gap is not a result of a write offset but rather a way it was mastered. It's often values close to silence but not zeroed and sometimes it's a part of a hidden track (HTOA). In cases like this, there is no way to move out that data fully out of pre-gap without it shifting into lead-out as there is not much space (it's common to have 1-2 seconds of pre-gap data audio which is a lot). I can definitely say it worth preserving by extending track 1 fixed 150 sectors back for all such cases, with, optionally, marking that in CUE? Before anyone says it's stupid, DIC already does a similar track 1 extend for CDi-Ready discs.
Statistics
With new method, I redumped all my audio discs and sadikyo redumped some of his. I shared new dumper version with Intothisworld and he shared it with ppltoast but I have yet to get some results from them. The current merged detail statistics is available here:
https://docs.google.com/spreadsheets/d/ … sp=sharing
TL;DR
73 discs - match redump (offset 0 is one of the perfect offsets)
9 discs - no perfect offset found (no distinctive silence in index 0 or no index 0 entries at all), offset 0 used so matches redump
7 discs - only one perfect offset found and it's true offset value used to master disc with, will require DB update
27 discs - perfect offset range excludes 0, will require DB update
3 discs - have pre-gap data which is impossible to fully shift out and it's not currently preserved, will require DB update
Side Effects
Given method works really well for PSX GameShark Update discs and Jaguar discs without relying on magic numbers and we get a perfect splits there too.
More Side Effects
If we left-align (or right-align) offset based on a perfect offset range, we will be getting matches for audio discs with the same data but different write offset, such as:
http://redump.org/disc/77301
I would like to hear your opinions on what do you think of it, let's discuss.