1 (edited by Jackal 2018-04-04 20:37:46)

limbo43 is dumping a lot of Xbox USA stuff, but he's getting some different results than existing dumps, which leaves me worried about the different dumping methods that are used: Whether or not Xbox Backup Creator is a safe dumping method, and if FreeCell dumps discs correctly (without the Kreon drive injecting any data that shouldn't be there) regardless of the drive's (unlock) state.

The first one that wasn't matching is Need for Speed: Most Wanted: Black Edition: http://redump.org/disc/50146/
The previous dump by h0lylag was changed to red status because it was supposedly bad.. without a real confirmation or redump by either dumpers. I've sent h0lylag a pm, but I don't know if he'll ever be back to read it.

The second one is Toxic Grind: http://redump.org/disc/50472/ vs. http://redump.org/disc/49392/

We really need to determine which of the dumps is wrong and what's causing the difference (and possibly find a way to detect bad dumps), before we end up with hundreds of potentially bad dumps.

I'm able to dump with a Kreon and also with a 0800v3 using FreeCell or XBC. I'll check some discs with all combinations and compare them as soon as I've some spare time, but that'll only be PAL discs.

Other dumps are matching: http://redump.org/disc/49180/ so I guess only a couple discs are affected by whatever is causing this.

4 (edited by limbo43 2018-04-04 22:53:22)

I took a look at the differences between my NFS dump and h0lylag's, and the only difference was about 12K of data immediately following the layerbreak. On h0lylag's dump that area was all 0's, and on mine it was random(?) data. I extracted both images and compared the files and they all are identical.

It made me wonder if he's using XBC or FreeCell. If he's using XBC he may have not noticed those sectors being 0'd out due to a read error (vs. being SS areas). But it looks like based on his posts he has a Kreon drive so it's almost certainly FreeCell. So not sure what happened here.

I haven't looked at the differences on Toxic Grind yet but will take a look.

Just to be clear, all of my dumps are via Kreon+FreeCell with only one exception (a disc that was scratched and was able to read properly in an 0800 drive w/XBC). I actually can't remember which disc that was right now but it wasn't NFS or Toxic Grind.

5 (edited by limbo43 2018-04-05 05:02:36)

Alright, checking out Toxic Grind now. This time, it's my dump that has zeroes right after the layerbreak. Looking at Starshadow's copy, his has the random data. In this case it's sectors 1913776-1913795 that are the culprits.

But it gets worse... I just re-ripped it with FreeCell on the same drive and it didn't spit out zeroes for those sectors anymore. The data now is a 1:1 match for Starshadow's dump.

So something seems wrong with FreeCell here. I'm looking at the source code and it doesn't seem to do anything special that might cause this. The code has a simple approach to retrying after any drive errors (5 times) and failing out completely, so an error was not reported by the drive/firmware.

The current FreeCell source is here, for reference:
https://github.com/claunia/freecell/blob/master/main.cc

This is bad because it's not just my dumps that might be inconsistent. Although the zeroed area may vary in size, so far it appears to be localized to some number of sectors immediately following the layerbreak. I don't have a definitive guide to the data layout on the disc, but FreeCell just treats all that stuff as "game data" without differentiating any kind of special padding after the layerbreak. If anyone has additional info it'd be most helpful to debug this.

I'm going to scan all my dumps and check 2-3 sectors after layerbreak to see how many of them have zeroes...

There's also a small possibility there's a bug in the SS sector skipping code in FreeCell but that seems like it'd be deterministic.

6 (edited by limbo43 2018-04-05 05:23:05)

Out of 167 sequential dumps with FreeCell, 31 of them have 0's in the area just after the layerbreak and arguably need to be redumped. That's 18.6% of all of my dumps, or nearly 1 in 5 dumps that have this behavior with FreeCell. Doesn't seem good.

It is likely many of the dumps on Redump need to be analyzed to check for this missing data.

Edit:

Looking in my bash history I can tell that all of these zero-padding-at-the-layerbreak happened on the same drive. I have three identical Kreon drives in my machine and have been dumping games with all three at the same time. So something is up with this specific drive--but it doesn't do it every time. As I mentioned above, I just used the faulty drive to re-dump Toxic Grind and the checksum matched this time.

So there's something up with one of my drives, and at least someone else's drive (h0lylag who dumped NFS and got the zero padding too)

7 (edited by Jackal 2018-04-05 06:36:39)

Thanks for bringing us closer to a solution. I guess you should try to eject and re-insert the disc after closing XBC (for dumping the DMI/PFI/SS) and before starting FreeCell. If that by any chance fixes the problem, then we need to add a check to FreeCell to make sure that the drive is in the correct mode/state before dumping. Otherwise it might be a firmware bug?

Also, it should be easy to find any other bad dumps if someone creates a tool to scan these sectors.

And it would be worth checking out if this issue also affects Xbox 360 in any way (!)

Jackal wrote:

Thanks for bringing us closer to a solution. I guess you should try to eject and re-insert the disc after closing XBC (for dumping the DMI/PFI/SS) and before starting FreeCell. If that by any chance fixes the problem, then we need to add a check to FreeCell to make sure that the drive is in the correct mode/state before dumping. Otherwise it might be a firmware bug?

Also, it should be easy to find any other bad dumps if someone creates a tool to scan these sectors.

And it would be worth checking out if this issue also affects Xbox 360 in any way (!)

I don't think this would be related to ejecting because it only affected one of my drives, and the routine is identical for all three drives. It seems drive or firmware related.

I always dump my discs using a 0800v3 drive/XBC and a Kreon drive/FC. Never encoutered such a problem. The dumps always match each other, regardless which disc I've dumped (Xbox/Xbox360). Only XGD3 cannot be dumped using the Kreon.

Possibly relevant? http://forum.redump.org/topic/15998/ori … ergh-disc/

11 (edited by limbo43 2018-04-08 21:54:59)

Still working on figuring this out. I used the same drive to re-dump several of the affected games and they then dumped correctly. Further analysis of my bash history reveals that most of these bad games were dumped back-to-back; I think the drive or laser was in an odd state that persisted across discs.

One suspicion I have is that maybe the Kreon firmware's "error skipping" feature could be a culprit, but I don't know exactly what it does. If it dumps out zero-bytes for what seem to be bad sectors instead of failing, that might be why. FreeCell does not send the cdb command to enable or disable Kreon's error skipping feature so if it's on by default, that could be a problem. According to the NFO the error skipping feature is enabled by default for "360 games" but maybe is on for both? If certain errors that _should_ bubble up do not, FreeCell can have silently corrupted dumps. So if this ends up being the culprit then FreeCell will need a patch to issue that cdb. (It's also possible this has nothing to do with anything and the error skipping feature isn't on by default)

So far I have not reproduced the error condition--every dump I do with the "bad" drive is correct. I've tried continuously looping multiple dumps on all 3 drives at the same time to simulate the conditions and have come up short. It's driving me crazy that I can't get it to happen again.

Affected dumps may be harder to detect than I thought. The most obvious ones just have all zeroes for 2-6 sectors immediately following the layerbreak, but there are some examples where there is a random perforation of zeroes in those sectors instead. This means it may be nearly impossible to detect good vs. bad dumps without a true redump by multiple people. Since we know I'm not the only affected user (h0lylag's NFS dump comes to mind) this could mean that there's a risk to any non-redumped Xbox title in the database today. Scary thought.

Still trying to repro and will update once I have more information

You could do a test. Scratch the disc and try to read it...

reentrant wrote:

You could do a test. Scratch the disc and try to read it...

I did try editing sectors.txt to intentionally not provide a few SS areas to see what the drive would do. FreeCell continued to function and the drive returned what appeared to be random data for those sectors. Not sure if the same issue though

I think I've gotten closer to the root cause. After redumping the 26 games that had this problem, I noticed that the affected drive begins failing the same way after being used continuously for a certain amount of time. It seems like it's an overheating or mechanical stress issue that affects the drive in such a way that the laser is not refocusing on the second layer fast enough at the layerbreak. As a result the drive is reading zeroes when it should read data. I don't know enough about the drive internals and hardware debugging to completely isolate the issue, but I found that letting the drive cool off is enough to get another clean dump, and if I do a lot of dumps back-to-back it eventually fails consistently.

Therefore, I'm trashing this drive, but we now know what one symptom of a bad drive looks like that is not detected by existing utilities. We have only found one dump so far besides my own that shows this issue, and I already redumped that game, but I will do a deeper dive soon.

I am posting a fix thread now with new checksums for all of the affected games:
http://forum.redump.org/post/59983/#p59983

By the way, I wrote something to check the feature set of my drives with Kreon and the "error correction" stuff was explicitly not available on my model. Furthermore I found some posts by Kreon himself explaining that the error correction is just shortening how long the drive will lock up/retry internally before unblocking and returning a sense error. So that wasn't related at all