1 (edited by Eidolon 2008-01-15 13:58:37)

Hi y'all, I'd have about 50ish SegaCD games, 50ish Saturn games, 50ish Dreamcast games and a couple of PS1 games to contribute to the database.

While doing some research on the net, I also found tosec.org with a similar approach to the matter and also a large database. How's this site's relation with them? Wouldn't it be a good idea to merge the databases?

Anyway, the processing of doing the "best possible rips" is very cumbersome. Even for only my few games it seems a little like "overkill", at least from a time consumption point of view. Any way that there might be programs released to automate the process?

Best regards,
Eidolon
http://www.eidolons-inn.net

Welcome

We've tried working together with them in the past, because we too felt it would be better to have one database for all dumps. TOSEC however didn't want to give up on their dumping method, because redumping all their discs using our new method (raw data tracks, write offset correction on audio, no riff headers added to audio) wouldn't be possible.. (we were told that a lot of the discs that were dumped could not be redumped anymore because they were already sold again etc..) Also there have been some differences in opinions on what's the best way to dump discs, which lead to some arguments between certain members.

With that said, a decent amount of people who dumped for TOSEC have also dumped PSX games for our project. We've added new systems over the past months not to try and replicate what TOSEC is doing, but because there were demands of people who wanted to dump the discs with what they believe is the best method. This is because CD dumps from TOSEC (except the data-only ones like 3DO, of which the error correct data can be generated) sometimes have missing samples in the first and the last audio tracks (that couldn't be dumped due to EAC limitations and lack of write offset correction.. TOSEC doesn't think of this as a big deal but we do, because we want perfect 1:1 dumps, even if it's just a few bytes).

I'd still recommend people with Dreamcast discs to team up with TOSEC for that, because they're better organized at that (we have 4 dumps and they have hundreds). The only different for that system is the write offset correction, which in the end only comes down to some shifted bytes.

As for automated tools, the only one we have so far is PerfectRip. The version that we are testing was abandoned and the new version won't be out for aprox. 6 months. Anyway the version that we are testing works pretty good on Plextor drives, so if you have a plextor drive I could send you a test version (unfortunately other brands don't seem to work that well).

It's not much use comparing the two dumping methods (Redump vs TOSEC), because (as harsh as it may sound) we consider their method to be obsolete. We started out with a method that was almost identical to TOSEC's. Then soon after we discovered how to detect the write offset of a disc and how to correct is. The current method is final, even though there's still room for improvement in the speed area. Once you get used to it it will only take a couple of minutes for each disc to dump (unlike EAC, PerfectRip can automatically rip data and audio tracks with the correct gaps).

ps. for SegaCD dumps we work together with the no-intro project - http://gbadat.altervista.org/

ps.2 A google search for redump.org brought up this topic: http://forums.segaxtreme.net/showthread … amp;page=2 so I assume that's where you came from smile if you you have any more questions let us know

3 (edited by Eidolon 2008-01-17 00:29:42)

Hi, thanks for your warm welcome.

However, after further discussing the issue with people over at SegaXtreme (your link) and my site (Eidolon's Inn - relevant discussion thread) I have decided that making perfect dumps is pretty pointless, at least for the time being when it's such a hassle to produce.

Thanks anyway and good luck with your project - at a free weekend I will at least try to reproduce the results some of your users had - I own some of the same SegaCD and Saturn games which are already in your DB.

Regards,
Eidolon.

Eidolon wrote:

I have decided that making perfect dumps is pretty pointless, at least for the time being when it's such a hassle to produce.

Perfect dumps for history, like MAME. If you just want to play, bin-cue is enough. But if you want to save a perfect digital copy of your CD - you should waste some time. Remember, that CDs aren't eternal...

Nice to see you here, Eidolon, I've visited your site for a long time.

As F1ReB4LL has said, it doesn't matter how long takes to make perfect dumps since they are forever. Also, perfect means unique images, wich makes possible to start other projects like No-Intro Screenshot Archive ( http://no-intro.dlgsoftware.net/main.php?lang=1 , sorry for the little spam wink ). Without a valid dat, you won't be able to start any massive project ever.

Best regards.

Eidolon wrote:

Anyway, the processing of doing the "best possible rips" is very cumbersome. Even for only my few games it seems a little like "overkill", at least from a time consumption point of view. Any way that there might be programs released to automate the process?

Eidolon wrote:

I have decided that making perfect dumps is pretty pointless.

*sigh* This is nothing, it takes a couple of minutes of your time (Not including the actual programs doing the dumping but setting up, which you could be doing other things during that time). A real time consuming project but extremely worthy never the less is the [Software Preservation Society] project of preserving magnetic media games.

Lucky for them they don't have a forum so people can come on and say how pointless the work they are trying to accomplish is.

Three years and still going strong.

Also, the replies I got from the guys at redump.org leads me to the conclusion that it is simply too much hassle to go after "perfect" dumps. There is no such thing as a perfect dump.

Taking Eidolon's quote from his forums, I think he misinterpreted my post. We DO believe that our method creates perfect dumps. Perfect in the sense that every single byte of data on the cd is secured (except the lead-in, lead-out and subchannel data which are irrelevant).

To sum up the differences again between our method and the TOSEC one:

- TOSEC extracts the cooked 2048 data track from a raw image, dropping 304 bytes of information in each data sector that are on the cd (many systems like PSX require the complete main channel, so all 2352 bytes/sector to be ripped, there's no real point in having a 2048 bytes/sector data track for preservation, regardless of what other people say.. audio is 2352 bytes/sector and so is data).
- TOSEC leaves out the track02 pregap, because older EAC versions were not capable of dumping it. This pregap can contain relevant data which will be left out in the resulting dump.
- TOSEC adds a 44-byte RIFF (wav) header to each audio track that isn't on the cd.
- TOSEC corrects the read offset only. Redump corrects both the read and the write offset, allowing audio tracks to be put back into the position they were BEFORE manufacturing. This has proven to be a more senseful way of dealing with audio than just correcting the read offset, because after read+write offset correction we now have audio tracks that start exactly at the intended position (often right after the pregap at byte 352800 for instance) and that have matching checksums across different regions for discs with different write offsets (these tracks would have different checksums using the TOSEC method). Some examples: http://redump.org/disc/1777/ http://redump.org/disc/447/ , all Mortal Kombat Trilogy versions, and lots more.

If you still think it's too much work for you (even with PerfectRip) then that's alright. I just wanted to explain why we believe that our method IS perfect. Perfect in the sense that the full contents of the cd are preserved the best possible way. There is still room for improvement for the speed and difficulty of the dumping process, but the output that is achieved is final.

Snake (Eidolon's Inn - relevant discussion thread) wrote:

BIN/CUE *IS* the best format to use for this stuff. CDRWIN all the way baby.

CDRWIN doesn't have any error correction/verification for audiotracks. You can loose a half of the track and it won't tell you about that (it's a rare case, but anyway). You can also get a dump full of clicks in audio. Audiotracks should be ripped with EAC only, and it should show "OK" for every track. As for Alcohol 120%, there's a long-standing bug, it produces wrong cue-sheets, so, never rip anything into bin/cue with Alcohol.

9 (edited by Eidolon 2008-01-18 19:13:15)

Let's first focus on Sega CD and Saturn games. The structure of those CDs is pretty simple:
- mode 1 data track, only 2048 bytes of data
- 0 to many audio tracks

The conclusion of the discussion on the Inn was the following.

There definitely IS the problem that every drive has a slight offset in reading the audio tracks. However, this offset is less than 1 sector of the CD, i.e. less than 1/75th of a second. So, practically, it does not matter at all!

So, we have concluded that the drive offset introduces a problem in checksumming methodology, not a problem in ripping methodology.

So you need an INTELLIGENT checksumming tool which calculates the CRC32 for the data track and the audio tracks within the BIN file seperately, adjusting dynamically for the drive offset.

Practically, that means that there might still be floating around several BIN/CUE images of a game, with different checksums for the BIN as a whole. But the idea is that for all those BIN files, the "intelligent checksum" is always the same, thus enabling comparability! The BIN/CUE files may have an audio offset of a few 1/75th of a second. That in itself is irrelevant again because you cannot predict the offsetting which happens when burning back a dump to a real CD, or playing it in a real system.

Concerning hard errors in reading the audio tracks, that can simply be checked by ripping the CD twice. If the resulting BIN file is the same, it means that the CD dump is ok. If the resulting BIN files differ in the audio track data, it means the CD is badly scratched, and the CD drive's error correction kicks in, producing different results with each rip. Meaning, the disc is unsuitable for a "perfect" rip anyway.

Consquently, I've begun working on the GoodSegaCD and GoodSaturn projects on the Inn, hoping that this slightly easier method will be adapted by the Sega retrogaming community as a new defacto standard (similar to the GoodGen stuff).

Looking forward to hearing your thoughts on this!

There is no such thing as "intelligent checksums". There has to be a standard of how reference checksums are calculated before you can disregard the data offset. We prefer to take both the read and the write offset into account when determining the reference, allowing audio tracks (when saved using the standard) to have identical checksums across different regions/games and not just to look at the data integrity the way you are planning to do. It makes me wonder if the benefit of speed will really be that great, because even a minor small scratch on any of your cd's will give you problems dumping and verifying them the 'fast' way. Sooner or later you will propably end up using EAC after all. Anyway, good luck with your projects.

As for GoodGen, I like No-Intro's dat better, because it's more accurate. Then of course I'm not even talking about MAME and how they want to preserve the Sega Genesis roms (splitting the data into separate files exactly like they are stored on the actual rom chips). Most people also consider this 'pointless' while others don't (see the resemblance?).

Vigi wrote:

There is no such thing as "intelligent checksums". There has to be a standard of how reference checksums are calculated before you can disregard the data offset. We prefer to take both the read and the write offset into account when determining the reference, allowing audio tracks (when saved using the standard) to have identical checksums across different regions/games and not just to look at the data integrity the way you are planning to do.

Yes, there has to be a standard. We propose the following one. Only do the checksum on the following audio data:
- starting at the first non-zero byte in the audio data section after the data track
- ending at the last non-zero byte in the audio data section before the end of the file

The positioning of this audio data block is different in the BIN files, depending on the audio offset of the drive. But, the checksum of the audio data block will be the same regardless of that audio offset.

In combination with the checksum of the data block, and the CUE sheet which lists the tracks, this allows for 100% verification of good (I would even call them perfect according to Redbook standard) SegaCD game dumps.

Vigi wrote:

It makes me wonder if the benefit of speed will really be that great, because even a minor small scratch on any of your cd's will give you problems dumping and verifying them the 'fast' way.

But why is that? Doesn't that concern your own ripping method as well? If there really is a bad scratch in the audio data section of the disc, there is NO WAY to reproduce the original bytes in these particular sectors.

In our method, we simply rule that out by stating to rip a CD twice. If the resulting BIN files have no difference, this is proof that the drive had no problems in reading out the audio data at all and can do it reliably over and over again.
A badly scratched CD (of which I have one to confirm that) will always produce different results from the drive internal error correction mechanism, producing a different BIN file with every dump.

Vigi wrote:

Sooner or later you will propably end up using EAC after all. Anyway, good luck with your projects.

The problem of EAC is that it can't cope with data tracks or extract them directly, and in the end you will end up with one file per track.
I love the simplicity of having one BIN file per CD, and that is difficult to achieve.

I would like to give that "PerfectRip" program a try though - I didn't find it on this site, and google results were inconclusive.

Vigi wrote:

As for GoodGen, I like No-Intro's dat better, because it's more accurate. Then of course I'm not even talking about MAME and how they want to preserve the Sega Genesis roms (splitting the data into separate files exactly like they are stored on the actual rom chips). Most people also consider this 'pointless' while others don't (see the resemblance?).

The whole life is a constant stream of defining goals, weighing methods to achieve them, and adjusting goals to fit those methods... wink

Eidolon wrote:

I would like to give that "PerfectRip" program a try though - I didn't find it on this site, and google results were inconclusive.

PerfectRip doesn't work on all the drives, IIRC only on _some_ Plextors...

13 (edited by themabus 2008-01-19 11:53:31)

Yes, there has to be a standard. We propose the following one. Only do the checksum on the following audio data:
- starting at the first non-zero byte in the audio data section after the data track
- ending at the last non-zero byte in the audio data section before the end of the file

The positioning of this audio data block is different in the BIN files, depending on the audio offset of the drive. But, the checksum of the audio data block will be the same regardless of that audio offset.

often it won't, you'd miss audio at the start of the firs track or the end of the last. e.g. [T-76044] Winning Post (J) has audio data till the very last byte of last track. if drive shifts that data to outside even by 1 sample crc will differ. and even if there is silence at the both ends of audio data it does not always compensate offset, you're right - drive offsets are pretty small most of the time but cd write offset can be up to several sectors large positive or negarive, so if you're not compensating it, in half of cases it will add with drive offset and produce even larger error. for your described method to work audio data should be enclosed in silence (both start and end) of maximum cd+drive offset sum but it's not.
for example: Lunar [T-45014] (cdoffset=+2072)
@theend (7104 zero bytes)
((cdoffset+driveoffset)*4)-slilence =>  8288+4x(driveoffset)-7104  => driveoffset=-296 samples <|larger than that and you cut off audio
@thebeginning (no silence)
((cdoffset+driveoffset)*4)+slilence => ((2072+driveoffset)*4)+0 => driveoffset=-2072 <|smalller than that and audio run into data track
only drives having offsets within this range would get similar audio segment crcs

14 (edited by gigadeath 2008-01-19 02:46:04)

Eidolon wrote:

Consquently, I've begun working on the GoodSegaCD and GoodSaturn projects on the Inn, hoping that this slightly easier method will be adapted by the Sega retrogaming community as a new defacto standard (similar to the GoodGen stuff).

Looking forward to hearing your thoughts on this!

Oh god that's the last thing we need... ANOTHER format? Moreover, another less accurate format? If you don't want to waste time just wait 'till there's click-it-once program. It's not like "oh cds are going to rot". I have CDs from 1985 that sound better than anything in SegaCD games.

And you forgot to mention the biggest difference between GoodGen/No-Intro and your project, they contain byte-perfect dumps too in first place. Every dump in GoodGen/No-Intro databases with missing/wrong bytes is tagged "hack","bad","overdump", "fixed" etc. for a reason, so you can't compare this "GoodSegaCD" to them, at all.

You're about to start a project that in a few years will be deemed full of bad dumps, think about it. It's not that you have to go that way just because Kega's author thinks obsolete shit like CDRWin works better (sorry if I sound rude)...

Another thing, even if you don't have the time to rip CDs the byte-perfect way, maybe someone in your community would have no problem doing it, IMO it's a way to force them to settle for an inferior dumping method, without letting them choose. You could tell your community members that there are alternatives like PSXDB who strive for a greater level of accuracy. After all cart dumps users are accustomed to perfect dumps, why would you force your community to renounce to them?

Finding offsets and dumping CDs with EAC takes me the same time of opening and dumping carts... 5-10 minutes of waiting haven't been a problem for the last 10 years of worldwide cart dumping, why should they suddenly become an unbearable wait now?

And it's not an ease of emulation matter either, PSXDB rips works perfectly under Daemon Tools and Kega smile

I agree that there's no real point in switching to a poor man's way of dumping cd's just because the other one presumably takes a bit longer.. but I do like the 'smart checksumming' idea that you came up with (checking the data integrity of a cd or image on-the-fly by comparing blocks). However I agree with themabus that without read+write offset correction you will soon run into problems where the start and the end blocks of the audio will have missing data, thus making crc comparison impossible. Also, if I recall correctly, the usual cue/bin tools out there (not sure if this includes cdrwin) don't append the gaps correctly and skip the track02 pregap just like old EAC versions are doing, so the chance of missing samples will be even bigger.

As for scratches and not being able to dump audio correctly: EAC's error detection and rereading mechanism is pretty decent and helped me through a lot of scratched cd's that would be impossible to dump correctly with conventional tools like cdrwin. Also, you forgot that C2 can also be used to detect errors (PerfectRip uses C2 to check for corruption).

Some discs don't actually require purposely-altered EDC/ECC data, and just use normally constructed ones. In such cases (PC games usually), you can generate a 2352 byte/sector "dump" from a standard 2048 byte/sector image (I don't know of much that can do this... besides using a CD drive emulator like Daemon Tools or CDemu then using the dumping apps) that is identical to the normally-dumped method. In certain cases, this will never be possible (eg, PSX).

And the creators of the ISO 9660 standard had no concern in this matter. It's a filesystem and it wasn't their duty to dictate how CDs store information in the low-level manner, that was Philips' job (see ECMA 130 for details on the physical layout of CDs and the low-level 2352-byte sectors). You can also store ext2, FAT, UDF, any other filesystem on a CD, and it's of no concern to those filesystem authors what the low level CD format is.

(Technically speaking, redump.org DVD images are all wrong since they don't include raw DVD sectors, which is far more difficult to access and not all DVD drives do it in the same manner (essentially every vendor has their own proprietary commands); who's to say that non-chipped PS2s actually check data that's not in the 2048-byte user data area of a DVD sector?)

daishadar wrote:

Dear lord, as someone who's spent a decent amount of time archiving TOSEC dumps, this information is very disturbing.

If you diff'd a TOSEC ISO dump and a Redump dump, any idea how different the two files would be?  If the difference is very small then it should be possible for someone with a full set of both dumps to create patch files to convert individual TOSEC dumps to Redump dumps.  This way, TOSEC dumps could be merged to more accurate Redump dumps over time.

It's amazing that it is this complex to create 1:1 backups of CDs.  It's just astounding.  Did the original creators of the ISO 9660 standard never see this coming?  From a high level perspective, CDs just store bits- just read all the bits off of it!  smile

In short all data-only dumps like 3DO are easy to convert if you for instance mount the cuesheet in daemon tools and then raw extract.

The problem with adjusting the audio is that you'll have to know the write offset of the original disc. With systems like SNK Neo-Geo CD this is often pretty easy to detect from the image itself, because the audio data always tends to starts at a certain position in the track (for instance if the audio data would start 2 samples after the start it was safe to conclude that the write offset was +2, which is also a common PSX write offset).

Other discs with larger write offsets have a pretty big chance of having missing samples at the start or the end of the audio. When I helped a TOSEC guy to convert the SNK set to the Redump format once we came across several discs that needed audio redumped because there were samples cut off at the start or the end. I think other systems like Saturn will have similar issues. Of course it should be possible to create a patch, but it would only make sense just to convert TOSEC dumps to Redump ones for collecting purposes (and for saving bandwidth).

If a TOSEC dumper should care about the accuracy of his dumps, he could always come here and redump them using the better method. We will never convert/steal any dumps from other projects. I know that some of the TOSEC dumpers are aware of the differences, but they don't seem to care enough about them to start redumping. Maddog from TOSEC once gave me the same explanation as Eidolon did a couple posts before: who really cares about 1/75th of a second?.. I think this thread makes clear that we are the only project atm that DOES care.

chungy wrote:

(Technically speaking, redump.org DVD images are all wrong since they don't include raw DVD sectors, which is far more difficult to access and not all DVD drives do it in the same manner (essentially every vendor has their own proprietary commands); who's to say that non-chipped PS2s actually check data that's not in the 2048-byte user data area of a DVD sector?)

You're right, RAW reading DVD's IS possible, but it's very difficult to accomplish: http://x226.org/?p=17 . I think the PS2 reads the DNAS ID in a special way (it should be in the user data area when extracting but it's not). Anyway, there's no point in including the DNAS ID either, because it can be injected after and the images can't be verified when it's included (I think).

If anyone has some info on extracting DVD's 2064 bytes/sector using custom firmware, and what the advantages are, plz let us know smile


Offtopic:

ps. after some google'ing I came across this thread: http://assemblergames.com/forums/showth … p?p=253548
I have a Datel PS2 cd here that has unreadable sectors in the same region, maybe it's possible to extract them in d8 and create a bootable copy big_smile

edit: I managed to extract the sectors and get the same patterns as the other guy, but according to this thread http://club.cdfreaks.com/f52/how-datel- … on-147005/ this data has no real purpose after all

Well, thanks for all the additional clarifications.

I now understand that it is really a problem that there is a chance to miss some audio data of the last track, and I think this is a flaw in Steve Snake's proposed method. It may work fine with 99% of all dumps, but would still produce errors in that 1% you gave examples of.

However, I still have a bad feeling about dumping RAW data (2352) from data tracks of systems (Sega CD, Saturn) which DEFINITELY do not make use of these additional 304 byte of information. What is the point? Those bytes are used for an "internal" (from the system's point of view) error correction algorithm to be able to fix sector errors. The user data (from the system's point of view) is just 2048 byte.

By ripping the RAW data from those data tracks and using that as the basis for the checksumming, you actually INCREASE the chance of producing unreliable dumps, because a 2352 byte block has less error correction than a 2048 byte block (spoken from the CD drive's perspective).

There is no such arguing in the audio section of CDs, because here the 2352 bytes definitely constitute the user data.

The dumps Themabus and I verified together had no such problems at all, and they're full raw dumps. Our dumps matched perfectly, and we live thousands of miles from each other using completely different PCs and CD-rom drives :B

You can see them all in the Mega-CD section.

gigadeath wrote:

The dumps Themabus and I verified together had no such problems at all, and they're full raw dumps. Our dumps matched perfectly, and we live thousands of miles from each other using completely different PCs and CD-rom drives :B
You can see them all in the Mega-CD section.

That is fine - but it doesn't answer my question! WHY dump the raw data if it is not user data?

23 (edited by gigadeath 2008-01-21 15:58:10)

Choose your answer:
-to achieve consistency between systems
-to get full raw dumps (data+audio) and not frankenstein dumps
-because if you really hate raw dumps you can convert them later but you can't do the other way around with 100% reliability
-because dumping raw takes only 10 seconds more and not 10 hours

gigadeath wrote:

-to achieve consistency between systems

You consistently dump RAW data tracks for every system, but you do not consistently dump just the user data for every system.

gigadeath wrote:

-to get full raw dumps (data+audio) and not frankenstein dumps

I would not consider calling the audio data RAW, because in case of audio data, RAW data is really the user data. In case of data tracks, RAW does not equal user data.

gigadeath wrote:

-because if you really hate raw dumps you can convert them later but you can't do the other way around with 100% reliability

I don't "really hate" them, I just question their usefulness for systems which do not require the EDC/ECC data as user data. For these systems, it just produces data overhead and increases the chance for getting bad dumps as explained in my previous post.

gigadeath wrote:

-because dumping raw takes only 10 seconds more and not 10 hours

See above, usefulness and data overhead.

I can't compel you to dump raw data, you can do as you wish. Who put up this site made the choice to have full raw dumps for every system, a choice I agree with. Even if there are empty sectors, there's consistency for the whole dump lenght. And as I said, you can convert raw dumps later but you can't do the other way around with 100% reliability. It's a strong reason to me.

Then it's up to you. I think the line followed by Redump.org won't be changed.