Yeah I did send you a link at one point. But you were pretty busy with redumper. I didn't have much of a chance to talk with you about it in any real nuanced or in-depth way like I was hoping to. And yeah there's probably been a lot added since then. It's been for the most part a perpetual work in progress since around April.
Anyway, I wasn't going to, but I think screw it, I'm just going to go over my general thinking process on the topic, and see what your guys' thoughts are in return. I've touched on these things to a certain extent with sadikyo, bikerspade, and superg like I said, but I'm very curious what input others like Jackal and F1ReB4LL might have as well.
First, just a quick general explanation of the spreadsheet: I started it originally to keep track of all the audio CDs I was finding that had non-zero data in the lead-in and lead-out, in order to have plenty of data and test cases to work from once testing started in earnest on putting audio CD offset auto-detection into practice. But one thing that kept bothering me was how extreme these supposed "offset" values were on some discs. The highest of these values were throwing off the track alignments on some of my CDs by up to 2 seconds... I don't think it's physically possible even, for glass mastering equipment to offset CD data by that much... The most likely explanation for these huge amounts of overflow data, as far as I can tell, is audio data being improperly trimmed by the audio mastering engineer at the studio, coupled with negligent red book compliance screening at the manufacturing facility.
But at the very least, it's evident there's some difference in origin & nature happening there, and it's just a matter then of figuring out a reliable way to distinguish between these two types of overflow data: One being overflow data due to standard, run-of-the-mill manufacturing offset, and the other overflow data due to sloppy mastering. So anyway, working on the spreadsheet, with that question in mind, I started just collecting data on all the discs I could, hoping that some useful patterns might present themselves.
I also, to establish some context for the problem, started keeping track of other previously confirmed offset values (from data track-based discs), along with those discs' ringcodes (with a focus on mastering SIDs). The correlation between these two factors is obviously not consistent enough to ever make any conclusive judgments from, but my thinking was that, at the very least, maybe this data could be used as a "quality check" tool of some kind. So when we encounter one of these extreme -40,000 or whatever "offsets," we could simply ask: "Okay, is there a basis for this offset in question in the realities of the manufacturing process that led to the creation of this disc?" i.e. "Does the LBR that created the glass master for this disc have any history of creating any other glass masters with this same strange offset?" With the very limited evidence available from the data contained in the audio CDs themselves, I thought that at the very least, this could be a very useful practical grounding for when we're approaching these types of very strange edge cases.
I also started inspecting CDs using superg's pregap "perfect offset" method and keeping track of the results from that. That was a huge revelation and as far as I know, is the only way to directly perceive on an audio CD itself what its original, true manufacturing offset was. The only downside is that it is unfortunately not readily visible on most audio CDs. It takes a very special arrangement of the data to be visible, and even when it is, oftentimes the evidence is not entirely clear-cut. But ultimately, I was able to use the pregap method to determine with reasonable confidence the true offset of about 10-15% of the CDs that I inspected. This data is all documented in the spreadsheet as well, including the track-by-track breakdown of each disc that I inspected that way.
There are also some other things recorded, such as PVDs for some data discs, notes on offset-related "alt" pressings (i.e. CDs that are identical to each other in all ways but offset) including the offset values that separate them, and various other bits of info.
Before I post the spreadsheet though, I want to first preface with explaining some of the primary concerns that have occurred to me as I've been putting all this data together.
The biggest thing that concerns me, I'll just say it plainly--and I don't mean it as a criticism towards anyone or anything, more just an observation of the ambiguity/difficulty of the problem--is that we have such a clean, and tight, and conclusive method for determining the original manufacturing offset of data track CDs, but then now that we come to audio CDs we may very well be left just resorting to somewhat of a "good enough" type of approach. There's so little evidence available to us to determine the true value for each disc, that it is almost justifiable to simply say, "well let's just shift what we can, capture all the data, and call it good."
The thing about that is though, with data track CDs we can of course determine the true offset directly and unambiguously, but even if we couldn't, no matter what the offset value we applied was (as far as I know, correct me if I'm wrong), it would still have no real tangible effect on the playback of the disc image itself; The file system is still accessed the same way and ultimately nothing in the user experience is changed.
With audio CDs on the other hand, when you adjust the offset between the audio data and the subcode data, it has a very direct and tangible effect on the playback of the CD. Namely, it changes the point at which the audio on the album starts, when it ends, as well as all the start and end points of the tracks in between, i.e. essentially it shifts the entire framework of the album. This is the type of thing that music collectors and enthusiasts are going to notice and care about when they're perusing our database, or listening to their favorite albums that they've dumped and preserved using our methods. Particularly those massive 20,000+ sample offsets, but even the smaller random values (e.g. -11, -17, etc.), being arbitrary like that, will likely irk many music purists and preservationists, if they can't be justified in any foundational way. Audio CD offsets are almost totally ambiguous like I said and as we well know, but I think that if anything, because of all those reasons, we should be being even more careful, even more restrained and discretionary than we are with data track CDs, when it comes to the types of offset values that we allow to be applied to them.
Couple that with the fact I mentioned earlier about some of those bigger overflow data values likely not even being a result of offset at all... Anyway I guess to sum up my basic point, in my opinion the number of samples that happen to be protruding from the program area is not justifiable evidence upon which to determine and correct for the manufacturer offset value, and due to the fact that the applied value can and does have a tangible effect on the accuracy of the playback experience that is preserved, we should be exercising all due restraint in making these types of changes to audio CD dumps.
I have a few more things to say in regards to this (and even a few ideas that might be workable to enhance our accuracy), but I don't want to bombard you with everything all at once, and I'd like to hear your thoughts on these specific concerns. I'll share the spreadsheet itself in my next post, but for now, thank you very much for reading. Cheers.
[EDIT: Got ignored. Well for posterity's sake, and so it doesn't go to waste, here's the spreadsheet. Audio CD test data, as well as sort of a rough draft of some other things I was working on. Maybe will come back and finish at some point just for fun.]
https://docs.google.com/spreadsheets/d/1Gknkby9nF3hW5CpVeVsPFJCn4gyADhLR8HF0LNRpgMU/edit?usp=sharing