As some of you are already aware, some CD's have a mastering issue where write offset changes across the disc. For the standardization purpose, I will be calling that "offset shift".
Historically we knew of a couple examples such as:
Philips Media Presents: CD-i Games: Highlights-Previews-Gameclips: http://redump.org/disc/74810/
The Cranberries: Doors and Windows: http://redump.org/disc/99290/
CD-I CD Interactive: http://redump.org/disc/97023/
Working on redumper lead-in/lead-out dumping functionality for the data discs I noticed that offset is actually shifting multiple times starting from the lead-in and even propagates to the lead-out. Analyzing this for discs with data track is possible due to the fact that if the first disc track is a data track, lead-in is usually scrambled empty sectors, and, respectively, if last disc track is a data track, the following lead-out track is also scrambled empty sectors.
Example for http://redump.org/disc/74810:
TOC:
track 1 { data }
index 00 { LBA: [ -150 .. -1], length: 150, MSF: 00:00:00-00:31:01 }
index 01 { LBA: [ 0 .. 2176], length: 2177, MSF: 00:02:00-00:31:01 }
track 2 { data }
index 00 { LBA: [ 2177 .. 2324], length: 148, MSF: 00:31:02-21:55:39 }
index 01 { LBA: [ 2325 .. 98514], length: 96190, MSF: 00:33:00-21:55:39 }
track A { data }
index 01 { LBA: [ 98515 .. 98612], length: 98, MSF: 21:55:40-21:56:62 }
offsets:
LBA: [ -2259 .. -150], offset: -294, count: 2110
LBA: [ -149 .. 2175], offset: -613, count: 2325
LBA: [ 2176 .. 98514], offset: -609, count: 96339
LBA: [ 98515 .. 98614], offset: -882, count: 100
As you can see from this example, first offset shift happens between lead-in and pre-gap and others are "between" tracks although a little bit imprecise. As lead-out internally is just another track, it propagates there too.
Digging deeper I uncovered that there is many more of such offset shifting discs and most, if not all PC data discs where couple of the last data track sectors are "corrupted" (descramble failed) are actually offset shifting discs. As redumper outputs detailed descramble statistics, I was contacted numerous times by different people including our mods to check a particular data dump log to make sure it is correct and analyzing these cases I realized it's the same offset shifting issue.
Why this is important?
Every offset shifting transition goes across multiple sectors gradually and due to some peculiar mastering detail that we don't know yet, these sectors are randomly corrupted. Such corruption makes it difficult for the dumping software to decide on what to do with such sectors and whether to attempt to descramble it.
As my recent findings hint that there are a lot of such discs, the purpose of this topic is to standardize how do we preserve such transitions so it follows redump.org preservation standards and is uniform across dumping software (which is basically, DIC and redumper lol).
As of today, redumper dumps such discs with one global disc write offset which is detected based on the first sector of the first data track (simplified). This is the default behaviour.
In addition to that, in redumper I provide an option "--correct-offset-shift", which follows offset shift changes, and such a dump can be added to redump.org as (Fixed Dump) edition. Regardless of using this option or not, we need to standardize our handling of such transitions.
Here's how that can be handled:
1. Leave transitional sectors intact.
2. Force descramble of all transitional sectors
3. Intelligently detect if the sector is scrambled based on a combination of content criteria and if it is, try to descramble it
Right now, both DIC and redumper are doing a variation of (3). More often than not, this descrambles some sectors and leaves other sectors intact e.g. you get a mix of god knows what and there is no way to recover scrambled content that is 1:1 with the original. In addition to that, redumper does it differently and that allows to descramble "better", but this is not the point here. The point is that (3) doesn't yield consistent results and these results aren't 1:1 aligned with the source (scrambled) material.
On the other hand (2) is the sweet spot as it is consistent and primary scrambled image can be reconstructed 1:1.
Finally, (1) is a compromise where we lose 1:1 but keep some sort of consistency.
I would like to hear opinions on this. Just please, let's keep on topic, I don't want the conversation to go elsewhere.