1 (edited by superg 2023-02-28 06:55:15)

As some of you are already aware, some CD's have a mastering issue where write offset changes across the disc. For the standardization purpose, I will be calling that "offset shift".
Historically we knew of a couple examples such as:
Philips Media Presents: CD-i Games: Highlights-Previews-Gameclips: http://redump.org/disc/74810/
The Cranberries: Doors and Windows: http://redump.org/disc/99290/
CD-I CD Interactive: http://redump.org/disc/97023/

Working on redumper lead-in/lead-out dumping functionality for the data discs I noticed that offset is actually shifting multiple times starting from the lead-in and even propagates to the lead-out. Analyzing this for discs with data track is possible due to the fact that if the first disc track is a data track, lead-in is usually scrambled empty sectors, and, respectively, if last disc track is a data track, the following lead-out track is also scrambled empty sectors.

Example for http://redump.org/disc/74810:
TOC:

track 1 {  data }
  index 00 { LBA: [  -150 ..     -1], length:    150, MSF: 00:00:00-00:31:01 }
  index 01 { LBA: [     0 ..   2176], length:   2177, MSF: 00:02:00-00:31:01 }
track 2 {  data }
  index 00 { LBA: [  2177 ..   2324], length:    148, MSF: 00:31:02-21:55:39 }
  index 01 { LBA: [  2325 ..  98514], length:  96190, MSF: 00:33:00-21:55:39 }
track A {  data }
  index 01 { LBA: [ 98515 ..  98612], length:     98, MSF: 21:55:40-21:56:62 }

offsets:

LBA: [ -2259 ..   -150], offset: -294, count: 2110
LBA: [  -149 ..   2175], offset: -613, count: 2325
LBA: [  2176 ..  98514], offset: -609, count: 96339
LBA: [ 98515 ..  98614], offset: -882, count: 100

As you can see from this example, first offset shift happens between lead-in and pre-gap and others are "between" tracks although a little bit imprecise. As lead-out internally is just another track, it propagates there too.

Digging deeper I uncovered that there is many more of such offset shifting discs and most, if not all PC data discs where couple of the last data track sectors are "corrupted" (descramble failed) are actually offset shifting discs. As redumper outputs detailed descramble statistics, I was contacted numerous times by different people including our mods to check a particular data dump log to make sure it is correct and analyzing these cases I realized it's the same offset shifting issue.

Why this is important?
Every offset shifting transition goes across multiple sectors gradually and due to some peculiar mastering detail that we don't know yet, these sectors are randomly corrupted. Such corruption makes it difficult for the dumping software to decide on what to do with such sectors and whether to attempt to descramble it.
As my recent findings hint that there are a lot of such discs, the purpose of this topic is to standardize how do we preserve such transitions so it follows redump.org preservation standards and is uniform across dumping software (which is basically, DIC and redumper lol).

As of today, redumper dumps such discs with one global disc write offset which is detected based on the first sector of the first data track (simplified). This is the default behaviour.
In addition to that, in redumper I provide an option "--correct-offset-shift", which follows offset shift changes, and such a dump can be added to redump.org as (Fixed Dump) edition. Regardless of using this option or not, we need to standardize our handling of such transitions.

Here's how that can be handled:
1. Leave transitional sectors intact.
2. Force descramble of all transitional sectors
3. Intelligently detect if the sector is scrambled based on a combination of content criteria and if it is, try to descramble it

Right now, both DIC and redumper are doing a variation of (3). More often than not, this descrambles some sectors and leaves other sectors intact e.g. you get a mix of god knows what and there is no way to recover scrambled content that is 1:1 with the original. In addition to that, redumper does it differently and that allows to descramble "better", but this is not the point here. The point is that (3) doesn't yield consistent results and these results aren't 1:1 aligned with the source (scrambled) material.

On the other hand (2) is the sweet spot as it is consistent and primary scrambled image can be reconstructed 1:1.

Finally, (1) is a compromise where we lose 1:1 but keep some sort of consistency.

I would like to hear opinions on this. Just please, let's keep on topic, I don't want the conversation to go elsewhere.

Which option produces the most functional dump?

All my posts and submission data are released into Public Domain / CC0.

3 (edited by superg 2023-03-01 05:27:01)

user7 wrote:

Which option produces the most functional dump?

Probably any option, transitional sectors are usually in the end of the track and most likely originally zeroed and unused.

EDIT: to clarify, the most functional one would be any out of these 3 proposed options but with "--correct-offset-shift" option on - that's probably what you wanted to hear! smile

superg wrote:

As some of you are already aware, some CD's have a mastering issue where write offset changes across the disc. For the standardization purpose, I will be calling that "offset shift".

That's an incorrect term. There's only 1 offset per disc, while you're talking about leftovers from earlier burning/dumping mastering stages. Those leftovers are physically present on the disc and need to be kept, since they belong to the disc data.

superg wrote:

Here's how that can be handled:
1. Leave transitional sectors intact.
2. Force descramble of all transitional sectors
3. Intelligently detect if the sector is scrambled based on a combination of content criteria and if it is, try to descramble it.

All 3 are wrong, since those leftovers often contain duped data and a plain descrambling will lead to duped sectors. Also, there are quite a lot of cases when those leftovers appear as a part of a first post-data audio track, like http://redump.org/disc/7986/

The main dump should be always left as is, with anything unusual left scrambled.

As for the additional "(Fixed)" dumps we've discussed eariler, you should probably try to descramble all the sectors, but the descrambled sector shouldn't be added into the dump, if exactly the same sector already exists in the descrambled dump (not to be confused with the twin sector-based protections, when there are 2 different sectors with the same headers exist).

There are also cases, when such sector shifts appear in pregap (sectors -150 to -1), so we're loosing a part of vital data for them by not preserving that area - http://redump.org/disc/73334/ as an example. Disc offset (= pregap offset) goes to the "Write offset" in this case and the "new" user area offset goes to the comments. Since we're not preserving the pregap, such discs aren't added as fully scrambled, but it's the only exception, all the other cases should be added as a pair of unfixed + fixed dumps (also, such a pair is a workaround and ideally I would replace them with a pair of scrambled + descrambled dumps).

Whether to fix the ones with garbage data in audio and add as (Fixed) is upto you. If there's a shifted piece of actual audio data in the lead-out area, probably worth to.

F1ReB4LL wrote:

That's an incorrect term. There's only 1 offset per disc, while you're talking about leftovers from earlier burning/dumping mastering stages. Those leftovers are physically present on the disc and need to be kept, since they belong to the disc data.

It does look to me like mastering issue, I don't know the nature of it, but the reality is that throughout the disc data "shifts". I think "shift" term is correct to use as it shifts data in relation to to previous position. Also, there are some data leftovers, but that's incomplete sector. If you analyze these transitional sectors, you will see that it gradually screws up pairs of bytes in sample aligned to sample size.
Propose better term smile, we have to call it something.

F1ReB4LL wrote:

All 3 are wrong, since those leftovers often contain duped data and a plain descrambling will lead to duped sectors. Also, there are quite a lot of cases when those leftovers appear as a part of a first post-data audio track, like http://redump.org/disc/7986/

I don't think we are on the same page here. There are no duped sectors and it's not random garbage. They are real damaged sectors with reasonable LBA. Do you want to see some bytes? I believe I can demonstrate.

F1ReB4LL wrote:

The main dump should be always left as is, with anything unusual left scrambled.

This is (1), you're talking about, this is exactly what you want. But I am saying that this is not what DIC does today and sectors from that damaged portion are half scrambled half descrambled.

superg wrote:

I don't think we are on the same page here. There are no duped sectors and it's not random garbage. They are real damaged sectors with reasonable LBA. Do you want to see some bytes? I believe I can demonstrate.

I wasn't talking about "random garbage" - as I've said, it often appears when smb burns a disc, dumps it and burns again at some point of its mastering process, so the scrambled sector leftovers appear and it is quite common when some sector presents on the disc properly and has some additional garbage leftovers with exactly the same bytes, just duped. Descrambling those will result in duped sectors.

Also, you wasn't talking about "damaged" sectors, but about shifted ones. There are cases when the scrambled sectors aren't shifted, but are damaged (or both shifted and damaged), it's a different story. I think the damaged sectors are still descrambleable, unless either filled with some random or padded with non-random bytes.

superg wrote:
F1ReB4LL wrote:

The main dump should be always left as is, with anything unusual left scrambled.

This is (1), you're talking about, this is exactly what you want. But I am saying that this is not what DIC does today and sectors from that damaged portion are half scrambled half descrambled.

DIC is supposed to only work with 2352-byte pieces starting with byte 0 of sector 0 ignoring any shifts, so any sector is supposed to be either descrambleable (if no shifts) or not descrambleable (if there are shifts). If sarami has changed this logic, it's better to call him into this discussion for explanations. Half scrambled/half descrambled may appear when there are no shifts, but there are sector damages, like I've mentioned above, if you're talking about that kind of cases, that behavior is correct.

7 (edited by Jackal 2023-03-02 20:41:26)

Here's another example: http://redump.org/disc/66596/

Some stupid questions:

- http://redump.org/disc/74810/ - Does the original disc play on a CD-i? What about a backup of the unfixed dump?

- http://redump.org/disc/99290/ - This has 8.848 errors despite being fixed? What's going on?

F1ReB4LL wrote:
superg wrote:

As some of you are already aware, some CD's have a mastering issue where write offset changes across the disc. For the standardization purpose, I will be calling that "offset shift".

That's an incorrect term. There's only 1 offset per disc, while you're talking about leftovers from earlier burning/dumping mastering stages. Those leftovers are physically present on the disc and need to be kept, since they belong to the disc data.

There are some clear cases mentioned in the topic where bad mastering is causing dumps to descramble incorrectly and creating tons of erroneous data sectors, because there's some samples missing or added at random positions in a data track. That's the main focus of this topic, right? I don't remember if this also makes the original discs non-functional or if the drive performs some sort of on-the-fly correction to output a correct sector?

And as F1ReB4LL was pointing out also on Discord, there seem to be many cases of discs with scrambled data in the Track02 pregap after offset correction, for example: http://redump.org/disc/1770/ + http://redump.org/disc/1716/ + http://redump.org/disc/7986/
And if I remember correctly, this disc http://redump.org/disc/5479/ also has garbage at the start of the audio. If you remove the bytes, the track matches the PS1 track, so it seems to have been ripped from the PS1 version. And IIRC the same was true for Fighting Force PC vs. PS1. But I'm not sure anymore, as it's 13-15 years since those were first dumped, time sure flies yikes
It's unclear whether this is caused by for example the gold master disc being a CD-R that was burned with track-at-once or something, but the most logical explanation is that an audio track was copied with offset garbage and then burned again. But this is a different issue then that we don't have to discuss here?

IIRC Truong / Ripper theorized that erroneous sectors with garbage bytes at the end of a data track were the result of a "split sector" or "half sector" or whatever they called it, that is part data / part audio tongue If you check the scrambled output, is it data and zeroes interleaved or does the data stop at some position and is it only zeroes after that?
But errors at the end of the data track also seem to be a different issue and since the remainder of the disc is audio tracks, performing offset shift corrections for such discs does not improve the dump in any meaningful way?

There were some examples recently where DIC was leaving sectors scrambled inside a data track with correct sync/header and mostly correct data, resulting in different dumps than before. So the default descrambling behavior must have been changed by sarami at some point or it's a bug. If a sector is inside a data track and the vast majority of it is data, IMO there's no sense in leaving it scrambled and the descrambled data is indeed more meaningful.

Jackal wrote:

- http://redump.org/disc/74810/ - Does the original disc play on a CD-i? What about a backup of the unfixed dump?

Original definitely works on CD-i. As of unfixed backup - it boils down to whether scrambled image can be burned as is, I'm not too familiar with writing limitatio ns. If you can't write scrambled, burned unfixed dump will not work as it will try to rescramble data track and it will be garbage.

- http://redump.org/disc/99290/ - This has 8.848 errors despite being fixed? What's going on?
All these 8848 sectors are zeroed even after shift correction, this is normal.

Jackal wrote:

There are some clear cases mentioned in the topic where bad mastering is causing dumps to descramble incorrectly and creating tons of erroneous data sectors, because there's some samples missing or added at random positions in a data track. That's the main focus of this topic, right?

Yes, this is correct.

Jackal wrote:

I don't remember if this also makes the original discs non-functional or if the drive performs some sort of on-the-fly correction to output a correct sector?

Yes I think the same. Drive seeks for sync frame, if it's not finding it I think it tries to reposition so these discs work on players and PC. Actually a good experiment would be to dump such disc using BE opcode as data and see what drive returns - will do that.

Jackal wrote:

And as F1ReB4LL was pointing out also on Discord, there seem to be many cases of discs with scrambled data in the Track02 pregap after offset correction, for example: http://redump.org/disc/1770/ + http://redump.org/disc/1716/ + http://redump.org/disc/7986/
And if I remember correctly, this disc http://redump.org/disc/5479/ also has garbage at the start of the audio. If you remove the bytes, the track matches the PS1 track, so it seems to have been ripped from the PS1 version. And IIRC the same was true for Fighting Force PC vs. PS1. But I'm not sure anymore, as it's 13-15 years since those were first dumped, time sure flies yikes
It's unclear whether this is caused by for example the gold master disc being a CD-R that was burned with track-at-once or something, but the most logical explanation is that an audio track was copied with offset garbage and then burned again. But this is a different issue then that we don't have to discuss here?

Exactly, this is known issue to me and it even happens to some official PSX discs. These we shouldn't touch anyways as it's part of the audio.

Jackal wrote:

IIRC Truong / Ripper theorized that erroneous sectors with garbage bytes at the end of a data track were the result of a "split sector" or "half sector" or whatever they called it, that is part data / part audio tongue If you check the scrambled output, is it data and zeroes interleaved or does the data stop at some position and is it only zeroes after that?

So yes it does look like it's a transition from data to audio e.g. from scrambled to unscrambled, but there are some byte artefacts, I will make some hex screen captures later.

Jackal wrote:

But errors at the end of the data track also seem to be a different issue and since the remainder of the disc is audio tracks, performing offset shift corrections for such discs does not improve the dump in any meaningful way?

So this effect happens also between data tracks (no audio tracks), http://redump.org/disc/74810/ - this disc has only data tracks, second track is CDXA video or something.

Jackal wrote:

There were some examples recently where DIC was leaving sectors scrambled inside a data track with correct sync/header and mostly correct data, resulting in different dumps than before. So the default descrambling behavior must have been changed by sarami at some point or it's a bug. If a sector is inside a data track and the vast majority of it is data, IMO there's no sense in leaving it scrambled and the descrambled data is indeed more meaningful.

Yes I saw that, at some point I think sarami changed something in DIC. This is the most important thing I'm trying to "fix" here. Regardless of applying shift correction, or not - I think we should not rely on sector content (or rely less) when deciding whether we should unscramble sector or not. It's impossible to come up with good decision algo if sector is partially damaged as pretty much any byte can be damaged and this shift issue here clearly demonstrate that.

Jackal wrote:

There were some examples recently where DIC was leaving sectors scrambled inside a data track with correct sync/header and mostly correct data, resulting in different dumps than before.

According to the commit log
. 2016-05-14 to 2017-05-07 : skip descrambling when sync is invalid
. 2017-07-02 : skip descrambling when mode is invalid
. 2017-07-28 : skip descrambling when reserved area(0x814 - 0x81b) is invalid
. 2018-09-15 : support mode 0

superg wrote:

we should not rely on sector content (or rely less) when deciding whether we should unscramble sector or not.

I agree if admin and other mods agree. I think TOC and SubQ and sync should be checked.
1. Data track on TOC, Data sector on SubQ and sync is valid --- it's apparently "data" and there is no room for discussion.
2. Audio track on TOC, Audio sector on SubQ and no sync ---  it's apparently "audio" and there is no room for discussion.
3. Data track on TOC, Data sector on SubQ, but there is not a sync (or sync is damaged) --- It's "data" or "audio"?
4. Data track on TOC, Audio sector on SubQ, there is not a sync (or sync is damaged) --- It's "data" or "audio"?
5. Data track on TOC, Audio sector on SubQ, there is a sync --- It's "data" or "audio"?
6. Audio track on TOC, Audio sector on SubQ, but there is a sync --- It's "data" or "audio"?
7. Audio track on TOC, Data sector on SubQ, there is not a sync (or sync is damaged) --- It's "data" or "audio"?
8. Audio track on TOC, Data sector on SubQ, there is a sync --- It's "data" or "audio"?

10 (edited by user7 2023-03-03 15:56:17)

Jackal wrote:

- http://redump.org/disc/74810/ - Does the original disc play on a CD-i? What about a backup of the unfixed dump?

The unfixed dump is irrecoverably fucked and useless. Same with basically all VCDs I've found with this problem.

Redumper fixing the shifting offset makes it completely useable, thus truly preserving the ROM in a fully useable state.

All my posts and submission data are released into Public Domain / CC0.

superg wrote:

Yes I saw that, at some point I think sarami changed something in DIC. This is the most important thing I'm trying to "fix" here. Regardless of applying shift correction, or not - I think we should not rely on sector content (or rely less) when deciding whether we should unscramble sector or not. It's impossible to come up with good decision algo if sector is partially damaged as pretty much any byte can be damaged and this shift issue here clearly demonstrate that.

Yes, that's the part I didn't like since ages. IMHO data should be left as is without any voodoo magic / analyzing sectors contents, etc.

user7 wrote:

Redumper fixing the shifting offset makes it completely useable, thus truly preserving the ROM in a fully useable state.

It should be possible to reconstruct such image given some metadata in comments section. If you want working ISO, just generate it by yourself using special tool like redumper or sth else smile

sarami wrote:

I agree if admin and other mods agree. I think TOC and SubQ and sync should be checked.
1. Data track on TOC, Data sector on SubQ and sync is valid --- it's apparently "data" and there is no room for discussion.
2. Audio track on TOC, Audio sector on SubQ and no sync ---  it's apparently "audio" and there is no room for discussion.
3. Data track on TOC, Data sector on SubQ, but there is not a sync (or sync is damaged) --- It's "data" or "audio"?
4. Data track on TOC, Audio sector on SubQ, there is not a sync (or sync is damaged) --- It's "data" or "audio"?
5. Data track on TOC, Audio sector on SubQ, there is a sync --- It's "data" or "audio"?
6. Audio track on TOC, Audio sector on SubQ, but there is a sync --- It's "data" or "audio"?
7. Audio track on TOC, Data sector on SubQ, there is not a sync (or sync is damaged) --- It's "data" or "audio"?
8. Audio track on TOC, Data sector on SubQ, there is a sync --- It's "data" or "audio"?

Some general things first. I think we should totally separate TOC and subchannel things. Some data tracks are marked audio in TOC and data in subchannel in Photo CD. Some CD-I have hidden track in subchannel that is not listed in TOC. Jaguar often has different track flags in TOC vs subchannel.
As we have two concepts here, primary TOC based split and secondary subchannel based split (Subs Indices).
For TOC based split we should use only data/flags from TOC, for subchannel based split we should use only data/flags from subchannel. The only data we want to use from subchannel for a TOC based split is when we're finding track split points as this data is not present in TOC.

(1),(2) - strongly agree

(3) - no sync, I'd say it's audio. But I'd add one exception to that rule. There are some CD-I discs where whole sync is zeroed but data is scrambled, it's more than one disc, so ideally:
if(standard_sync || zeroed_sync && expected_scrambled_msf) it's data. In general I find MSF is a very handy and strong check for scrambled.

(6) - definitely audio

(4),(5),(7),(8) - this is a source of discrepancies because of TOC subchannel properties mix.

13 (edited by sarami 2023-03-05 09:33:19)

sarami wrote:

2017-07-28 : skip descrambling when reserved area(0x814 - 0x81b) is invalid

At least, Jackal, reentrant and superg don't agree this. I'll delete this code. Some disc with erroneous sectors needs to redump.

reentrant wrote:

It should be possible to reconstruct such image given some metadata in comments section. If you want working ISO, just generate it by yourself using special tool like redumper or sth else

I totally agree. Fixed dump page (e.g. http://redump.org/disc/99290/) don't need and I want to delete it.

superg wrote:

(3) - no sync, I'd say it's audio. But I'd add one exception to that rule.

Then, sector with damaged sync is "audio" ok?

superg wrote:

There are some CD-I discs where whole sync is zeroed but data is scrambled, it's more than one disc, so ideally:

Tell me the url of the database of this site.

superg wrote:

(6) - definitely audio

I think so. Some Sega Saturn and CD-i ready disc (and etc.) have it. (e.g. http://redump.org/disc/58172/, http://redump.org/disc/35804/ )

superg wrote:

(4),(5),(7),(8) - this is a source of discrepancies because of TOC subchannel properties mix.

You say, "subchannel based split we should use only data/flags from subchannel". Redump.org adopts "subchannel based split" except for TOC vs. Subs desync disc. Then (7),(8) should be descrambled in accordance with subchannel and (4),(5) should not be descrambled in accordance with subchannel.

14 (edited by superg 2023-03-05 17:45:53)

sarami wrote:

I totally agree. Fixed dump page (e.g. http://redump.org/disc/99290/) don't need and I want to delete it.

I just want to make sure we are on the same page here. "Fixed Dump" concept is different, it's not "fixed descrambling", it's split according to the offset shift. Check the offset table I posted for http://redump.org/disc/74810. If you split according to these offsets, you will have a fully functional image. Otherwise if you split using only offset from the first track, the image will not work on emulators and when burned in any mode.

sarami wrote:

(3) - no sync, I'd say it's audio. But I'd add one exception to that rule.
Then, sector with damaged sync is "audio" ok?

Yes, I would agree to that.

sarami wrote:

Tell me the url of the database of this site.

Sure, I'll check my test dumps and share it.

sarami wrote:

I think so. Some Sega Saturn and CD-i ready disc (and etc.) have it. (e.g. http://redump.org/disc/58172/, http://redump.org/disc/35804/ )

Yeah, also a couple PSX discs have this data spillover to audio.

sarami wrote:

You say, "subchannel based split we should use only data/flags from subchannel". Redump.org adopts "subchannel based split" except for TOC vs. Subs desync disc. Then (7),(8) should be descrambled in accordance with subchannel and (4),(5) should not be descrambled in accordance with subchannel.

No I don't think we do what you describe. Maybe it simply wasn't discussed before in detail and we just follow how it was implemented in DIC. TOC/subchannel mismatch is very confusing for everybody and we absolutely have to clarify and formalize it.

Here's what data we collect for cue sheet / track split and where it's available:
1. Count of sessions (available in TOC but can be derived from subchannel too if other session lead-in is included)
2. Count of tracks (available in both TOC/subchannel)
3. Data track flags: data/audio, 4ch, dcp, pre (available in both TOC/subchannel)
4. Track index 01 (available in both TOC/subchannel)
5. Other indices: index 00, index 02+ (available only in subchannel)
6. MCN/ISRC (available in both TOC/subchannel)
7. other CD-TEXT (available only in TOC)

As you can see from this list, TOC and subchannel share almost everything
TOC: (1)(2)(3)(4)(6)(7)
subchannel: (1)(2)(3)(4)(5)(6)

For TOC based split, the primary source of truth is data from TOC.
On the other hand, for the subchannel based split, the source of truth will be data from subchannel.
It's only logical to follow this rule for every data type that we extract as it removes the confusion and separates the concepts.

At redump.org saying that we prefer TOC basically means that all data from TOC should have highest priority.

I decided that both dumps like Subs vs. TOC desync disc (http://redump.org/disc/37134/). "TOC control" is in priority.