Ok, here's another update based on the discs I purchased and dumped myself. Initially that was asked by Fireball as he has experience with them but this is really good shitty audio usecases and some general info on what we can encounter.
Dracula: Music Collection
http://redump.org/disc/14890/
Nothing special about this disc other than two masterings differ by the offset, same happens here: http://redump.org/disc/77301/
Specified possible write offsets +390 and +684 cut into tracks, I don't think they are relevant. True offset should be in perfect range [-4926 .. -3731] based on my redumper algo.
We can offset match such discs using two possible approaches:
1. Always shift each dump left-most or right-most so regardless of the offset each dump will match each other. Pros are that it's totally automatic. But a big con is that we will need to redump all audio entries which is unrealistic.
2. By introducing something I'd call a "universal checksum". Basically upon successfull dump, redumper calculates right-most (or left-most) data shift checksums and outputs crc/md5/sha-1 checksum the usual way: <rom offset="+123" size="68701920" crc="060bb712" md5="47393f188ff00fafbdf77b5eb771dbd3" sha1="ef991d90b284b0c92ab2b4eb0eb77942e32bb98c" /> and notes the offset value needed to right-most/left-most shift. We store this information somewhere for the future reference. Every time a potential different offset verification title is dumped, we compare universal checksums and if they match, we add another ringcode line to the matched entry with deduced offset in relation to the previous entry.
The pros of such approach is that we don't dramatically change the way we dump comparing to method (1) so already added dumps stays the way they are. The cons are that it's not 100% automatic.
Personally, I'm in for (2), this is easy to implement and we can set a precedent that will be used in the audio dumping world.
Tenbu Mega CD Special Mini Audio CD
http://redump.org/disc/6695/
This is very clean, redumper shifts out 13 samples left from lead-out and everything still fits in pre-gap nicely.
Micronet Music Collection Vol. 1
http://redump.org/disc/30335/
This has huge non-zero chunk (22006 samples or 88024 bytes) in the lead-out. According to the proposed rules, we shift the data out of there left by the amount of 22006 samples. This will get rid of the lead-out data but will spill over 16 non zero samples into pre-gap. Not ideal but it's close to the truth and perfect range for this disc is [+8423 .. +21155]. IMO the best solution given that we preserve whole data in one file.
Oyaji Hunter Mahjong
http://redump.org/disc/39873/
This is exactly as comments say. I stand corrected, this is more horrible. There is 68 sectors of data in lead-out, there is 150 sectors of data in-pregap and there is ~670 sectors (1574524 bytes) of non-zero data in TOC before pre-gap. I capture everything in redumper and it seems to be consistent in the scram file. Offset 0 is used by default and I extract leadout.bin as is and getting same checksums calculated by Fireball, everything matches. 150 sectors data in pre-gap are fully preserved in Track 1, but what to do with the data in TOC? I don't know. Well, in fact I will propose a solution later but that requires everybody to be open minded
Other Considerations
Now, with all these examples in mind, I have a modified idea which will let us capture every byte and be mostly redump compatible (including site and the current DB).
What if we never shift audio e.g. always use offset 0 but store spillover lead-in and lead-out data in separate tracks? Something like pregap.bin / leadout.bin that we don't currently "preserve" but in a more generalized way.
This fits in a very elegant way with lead-out as internally, leadout is just another disc track with AA track number and it has all the track properties such as mode, data, positional subchannel etc. As in reality lead-out track spans the whole disc, we trim all the zeroed data and make it sector aligned. If there is no data in the lead-out - we don't create a file and that will satisfy 99% of all the use cases. But at the same time we accomodate for the case where there is something there. As other two big benefits I see that we can preserve Dreamcast logo data which is session 1 lead-out and I sometimes see lead-out audio spillover in PSX discs where it's not currently being preserved in any way. We can have the track defined in the CUE-sheet with all the appropriate properties thus this data will be preserved by "data hungry" preservationists, whoever they are. The similar approach will go for the non zeroed lead-in track. If it's empty, like it usually is in 99% cases, it won't exist. If it does, it's zero trimmed at front and sector aligned. No data is lost ever, redump track compatibility is all time high as it's CUE tied and we add it to the website like a usual tracklist with hashes.
Oyaji Hunter Mahjong example:
FILE "Oyaji Hunter Mahjong (Japan) (3DO Game Bundle) (Track 1#00).bin" BINARY
REM REDUMP LEADIN
TRACK 00 AUDIO
INDEX 00 00:00:00
FILE "Oyaji Hunter Mahjong (Japan) (3DO Game Bundle) (Track 1).bin" BINARY
TRACK 01 AUDIO
INDEX 01 00:00:00
FILE "Oyaji Hunter Mahjong (Japan) (3DO Game Bundle) (Track 2).bin" BINARY
TRACK 02 AUDIO
INDEX 00 00:00:00
INDEX 01 00:12:45
FILE "Oyaji Hunter Mahjong (Japan) (3DO Game Bundle) (Track 3).bin" BINARY
TRACK 03 AUDIO
INDEX 00 00:00:00
INDEX 01 00:09:60
FILE "Oyaji Hunter Mahjong (Japan) (3DO Game Bundle) (Track 4).bin" BINARY
TRACK 04 AUDIO
INDEX 00 00:00:00
INDEX 01 00:11:63
FILE "Oyaji Hunter Mahjong (Japan) (3DO Game Bundle) (Track 4@AA).bin" BINARY
REM REDUMP LEADOUT
TRACK 05 AUDIO
INDEX 01 00:00:00
Or variation naming/numbering schemes. I specifically chosen # and @ for filenames as the symbols sort before and after number entry thus you get a nice look and this scheme supports multisession pre-gaps/lead-out as we don't have to renumerate anything.
We could use simply "Track 00" for lead-in and "Track 05" for lead-out but there has to be a good way of supporting this for multisession discs where there can be session lead-out/lead-in between two tracks with adjacent numbers.
Or, we don't have to add it to the CUE-sheet at all but in my opinion having it there ties all the files together for the preservation. We could even have special redump CUE tags for that, plenty of ways.