1 (edited by cHrI8l3 2009-04-03 14:25:33)

Job: merge different versions of this same game
Goal: find best method that will require smallest user effort for compression and decompression (and perhaps don't use too much of system memory)

Configurations used:
http://img8.imageshack.us/img8/1701/clipboard04n.jpg

Test #1:
Game: [PSX] Final Fantasy IX (Disc 1) (8 versions)
Versions order used for merging: U v1.0, U v1.1, E, F, G, I, S, J
Results table:
http://img17.imageshack.us/img17/8871/clipboard03s.jpg
Brief summary:
- 5,5gb - uncompressed
- 2,8gb - PackIso
- 379mb - ECM+Split:100mb+Rep:200mb+LZMA:32mb in ~10 minutes (~250mb for decompression!)
- 375mb - ECM+Rep:1gb+LZMA:128mb in ~25 minutes (~1gb for decompression)
- 354mb - ECM+Rep:1gb+NanoZip:1.5gb in ~40 minutes (~1.5gb for decompression)
- 344mb - ECM+Split:100mb+Rep:1gb+NanoZip:1gb in ~30 minutes (~1gb for decompression)

One version of game & ImageDiff for others:
- 358mb - 7-Zip:192mb
- 344mb - NanoZip:1.5gb

Some notes:
- times were measured on dual core 2.5ghz, and accuracy is about 90%
- it will be possible to use ECM inside FreeArc, but we need to wait for next more stable version - that will reduce whole process into one command

Basic idea of splitting:
- split data into parts, f.e. v1.bin.001, v1.bin.002, v2.bin.001, v2.bin.002
- add all to archive sorted by extension and name: v1.bin.001, v2.bin.001, v1.bin.002, v2.bin.002
- apply repetition filter with at least twice large dictionary than part size (so if you have parts for 100mb you need at least 200mb dictionary for it to work good)

Conclusions:
- no need for storing by ImageDiff when merging with repetition filter is much more conveniant
- with splitting you can achive amazing results with cost of convenience...
- buy more RAM - there's never too much when it comes to compression

- 358mb with 7-Zip (one ecm'ed version & ImageDiffs)
- 344mb with NanoZip (one ecm'ed version & ImageDiffs)

it (NanoZip) would decompress slower than 7z, though
right?

it (NanoZip) would decompress slower than 7z, though
right?

yes, it's symetrical algorithm and decompression takes pretty much this same time as compression (and memory also..)

You did not try PAQ? tongue

5 (edited by cHrI8l3 2009-04-02 21:57:08)

You did not try PAQ?

no way big_smile PAQs are disqualifed because i dont have all day to wait for each to finish big_smile

- 375mb with ECM+FreeArc/LZMA in ~30 minutes
- 354mb with ECM+FreeArc/NanoZip in ~40 minutes

so it's 30-40 minutes to decompress those?

it's still too much imho
maybe for archiving, when kept to oneself - accessed infrequently and space is really an issue
but otherwise (if such archives would be distributed)
those minutes multiplied with thousands of CDs and thousands of copies would turn into years

7 (edited by cHrI8l3 2009-04-03 10:08:35)

so it's 30-40 minutes to decompress those?

it's still too much imho
maybe for archiving, when kept to oneself - accessed infrequently and space is really an issue
but otherwise (if such archives would be distributed)

I agree, for distribution there will need to found better method which will at least consume less memory... I have in mind splitting all discs into f.e. 100mb parts and then merging with proper sorting.. with 100mb it would be enough to have 150-200mb of RAM for repetition filter - I'll test this method soon

those minutes multiplied with thousands of CDs and thousands of copies would turn into years

cmon man, I've been recently doing massive recompression on about 1700 discs and it took barely 2 weeks with ultra 7-zip on 2x2.5ghz big_smile and compressing with this same method inside FreeArc would be about 20% faster

8 (edited by topkat 2009-04-03 11:49:43)

I had some though about space-saving as well. Since there are quite a lot of redundancies across variuos dumps, merging ala MAMEs' parent/clone-relationships came to my mind. Finally I did a testrun with Ace/Air Combat.

First I renamed each track to it's crc32-hash, eg. 'cfbe8182.bin'. Then I looked at a recent mame-dat and created a corrosponding merged dat-file for the mentioned game. After that I altered the cue-files to reflect the crc-filenames. With its' 60+ tracks per version it was quite an insane task. As a last step, I fired up clrmame and rebuild a merged set with the created dat-file. After torrentzip, the final filesize was around 1/3 of the individual archives. Quite nice I think.

Pro:

- Good compression ratio
- Shareable archives due to torrentzips' consistent filehashes
- Compression duration is quite small

Contra:

- Manual editing/creating of merged dats
- Altered/New cue-files needed
- Manual hunt down identical files arcoss game version can be a insane task
- No emulator/frontend support yet?!
- Hard to tell files apart with only the crc-filenames

Questions:

- What to do with files identical across multiple games (eg. the Capcom dummy track)
- How to handle multi disk games?

Sadly, due to the hugh amount of manual editing, I lost interest and deleted all my work. If anyone is interested, I could redo an example...

9 (edited by cHrI8l3 2009-04-03 13:22:29)

topkat, idea is effective but it's deffinately too much job with editing those files, everything must be done manually, and you're loosing original file naming and cues, personally I would not want such mess... smile

good method should be done by one command, don't change anything in original files and decompress files to their original state, and I have some new ideas about that...

Updates:
- low-memory methods using split have been added to test (config "Arc 6" needs only ~250mb for decompression)
- approximate compression time added to the results

However... there is still some manual work with splitting... would be nice to have it done by automatically before compression and auto-joined after decompression

10 (edited by topkat 2009-04-03 15:18:57)

cHrI8l3 wrote:

topkat, idea is effective but it's deffinately too much job with editing those files, everything must be done manually, and you're loosing original file naming and cues, personally I would not want such mess... smile

Yupp, that's why I trashed that idea

Still a nice sideeffect was to be able to easily audit sets with clrmame without prioror (de)compressing forth and back. Hopefully someone will come up with a solution with the best of both worlds...

- no need for storing by ImageDiff when merging with repetition filter is much more conveniant
- with splitting you can achive amazing results with cost of convenience...

dunno, to me ImageDiff results look most appealing
it would depend on decompression speed difference, which one of two i'd consider the best

splitting in parts adds another layer the same as ImageDiff and both speed and compression is worse
otherwise it's just too slow, imho.

when i said years i mean single game from such merged 7z+ecm+flac (or tak)+ImageDiff set
should decompress in about 2-3 minutes or so
it's most frequent scenario, nobody really needs whole set decompressed at once,
and still ImageDiff would be faster than 10 minutes, i guess.
so, very roughly:
let it be 20 minutes overhead on game
let it be 100 such games (or decompressions user commits)
and they're shared by 1000 people
100/3 = 33 hours = 1.4 days
1.4 * 1000 = years
sure they would download faster, but download is done passively in background
when in contrast decompression takes full PC load and needs interaction from user

Weeks ago I have suggested to combine winrar+quickpar, here I explain better:

1) One main uploader compress everything with winrar the same way you'd make with torrentzip. I mean one archive for every game.

2) Then he makes simple pars for all archives (only .par2) with quickpar.

3) Now he shares the pars with other testers to see how many blocks are needed. As par will search blocks, even if set in random mode, only some blocks for every archive should have to be fixed/parred.

4) Once you'll have the diff blocks you can share them and have equal rar archives on every pc.

5) This has to be done only once, unless when a game name will be changed.

Obviously everything I said has to be tested. I hope I have explained it well.

My patch requests thread
--------------------------------

...moreover.

If someone makes a little tool that set date like packiso and we choose a winrar version that people must use to archive (you don't have to share it, and telling people "use v4.00" is not illegal) probably every archive will have only a few kbytes or less to be fixed.

My patch requests thread
--------------------------------

3) Now he shares the pars with other testers to see how many blocks are needed.

about equal to archive size, i guess, which would be huge

Doesn't 7zip have a greater compression over RAR and wasn't there a purpose for keeping the tracks in separate files even when torrenting? What this is suggesting is going to kill everything that has been worked on with packIso, especially with keeping the tracks separate so people can get only the tracks they need without having to download the whole image or even seed the tracks they currently have while downloading the ones they may not. Not to mention all the current torrents that have been made using packIso (which is being accepted by the torrenting community from what I can see).

I'll admit that the packIso format may not be the best compression but it is at least much better then the current alternative torrentzip and still offers the ability to keep tracks separate and keep the files the same for sharing purposes. The time it takes to unpack a packIso archive is also a big help unlike what is being suggested here where if I wanted to say play a game or make a patch I would have to wait at least 30 minutes just to decompress it which wouldn't be very helpful at all for me even if I were to only access them very rarely.

As far as setting the date, that is what the rmdtrash.exe file does in packIso. It sets the dates on the files themselves before they are compressed so that all of the 7zip archives have the same date for each files on everyone's computer. It also clears the archive bit set by the OS in the file system to further clear out unwanted data for compression.

Everyone seems to be so worked up over using clrmamepro that you have forgotten the purpose of the archives to begin with and although I can understand why you would want to use it the way you are going about this seems to be a bit far fetched as far as actual usability for everyone. I've already mentioned that adding packIso checksums is all we would have to do in order to get a dat for packIsoed files while still maintaining  our original compression. As long as there is someone that has a copy of the game that matches the database then they can make a packIso archive and post the checksums of the files and in the case of the data track the checksum of the ecm'd file in the 7zip archive should be taken instead of the archive itself so that file name changes won't effect anything.

The compression percentage is not something that we need to worry about attaining nor should we worry about whether or not the archives are compatible with any emulator or front-end application. The purpose of the archives is just to store them in a way that makes it easy for everyone to get and decompress for use however they see fit after they get the archive and right now packIso is the only thing I can see that really offers that at this time.

Doesn't 7zip have a greater compression over RAR

rar would compress audio better, so it makes sense if all tracks go to single archive.

i honestly don't think that ability of torrentzip to produce identical archives is either something to worry about.
how many people use that really? 20 maybe, maybe less.
PackIso isn't bad, but surely if better compression ratio/speed combination can be found, why not?
(for instance: replacing ape with tak would improve it a lot)
it's really what matters - fast download/decompression, not ability to join in and seed - pretty much useless, imho.
what really happens is: somebody uploads a file and rest of the people fetch it and seed, that's all.
nobody cares about torrentzip.
so would somebody recompress everything - it wouldn't be a big deal, imho.

17 (edited by cHrI8l3 2009-04-04 13:08:46)

dunno, to me ImageDiff results look most appealing
it would depend on decompression speed difference, which one of two i'd consider the best

creating Diffs and restoring original game from those takes a lot of time and user effort, even more than compressing/decompressing
as I proved you can merge images ultra fast with FreeArc LZMA (merging 5gb into 380mb in about 10 min - is that long ?? and decompressing everything to its original state in 5~8 min)

right now in present moment of time merging with low-memory decompression needs manual splitting, however ... I wrote to the author of FreeArc, he is open to suggestions and theres a pretty good chance that he will include splitting feature into one of next FA releases
+ add to that ECM filtering automatically done inside FreeArc
and imagine... in near future it will be possible to merge with even less than 256mb of RAM and with just a single command-line or few clicks inside GUI

if you would have audio tracks stored as .wav you could configure FA to compress those for you with Ape format and .bin with LZMA
...and if you would not want to convert to .wav you could create archive with 2-steps:
1. create + add all data tracks with LZMA method
2. add to previously created archive all audio with Ape codec
isn't that convenient ?

and dont forget, merging right now is only experimental
I'm only trying to show some possibilities, maybe someone will try those for saving HDD space smile only only remember to test data after compression if you deal with unstable alpha versions of soft tongue

Doesn't 7zip have a greater compression over RAR and wasn't there a purpose for keeping the tracks in separate files even when torrenting?

7-Zip does much better with data than RAR
RAR does much better with audio than 7-Zip
APE does even better with audio than RAR, because it is dedicated audio algorithm
APE is very fast and works with low-memory
thats basically it...

and when it comes to audio... imho Ape is a good format, it is fast, well known and supported by many other software, yes you can get better ratio with TAK... but so what ? you can get even better (+ faster) with LA

creating Diffs and restoring original game from those takes a lot of time and user effort, even more than compressing/decompressing

it would be strange, if so
- when decoding with ImageDiff it would only insert certain pieces of data - it should be really fast
there would be only one ECM file (Reed-Solomon calculation) and decompression algorithm (most complex)
would need to process far less data
- when extracting whole set from single archive ther's ECM for every file and decompression algorithm would process more data

imho it doesn't really matter what compression speed is as long as it's sane, since it's done only once

and when it comes to audio... imho Ape is a good format, it is fast, well known and supported by many other software, yes you can get better ratio with TAK... but so what ? you can get even better (+ faster) with LA

well from what i read, it's anything but fast and isn't that much supported either
FLAC is faster (about 2..4 times) and more widely used but compression is slightly worse (2..3%)

compression:
http://flac.sourceforge.net/comparison_all_ratio.html
http://synthetic-soul.co.uk/comparison/lossless/
hardware/software support:
http://wiki.hydrogenaudio.org/index.php … comparison
http://flac.sourceforge.net/comparison.html
http://en.wikipedia.org/wiki/Comparison_of_audio_codecs

TAK is experimental - true, but it offers compression of APE at the speed of FLAC

LA is very-very slow

oh, i see - if you'd extract single file from archive
for ImageDiff you'd need to run it through ECM and ImageDiff
but for FreeArc only ECM (and then join files)
but still decompression should be notably longer for FreeArc since it's 5.5gb vs 700mb
so what's lost on ImageDiff should be regained on decompression

LA is very-very slow

as far as Ive been testing its slightly faster than ape (using maximum possible settings) hmm need to check that again...

but for FreeArc only ECM (and then join files)

Ive been trying to explain that FreeArc will be soon able to UnECM (+ Join) data automatically after decompression - so the only thing user will need to do is click on "Decompress" to restore original .bin's, without further processing files...

but still decompression should be notably longer for FreeArc since it's 5.5gb vs 700mb

unpack 5.2gb (8 files) from 380mb FA archive ~7 minutes
unpack 650mb from 350mb 7z ~80 seconds
unpack 8 x 650mb from 2gb 7z ... 8*80 = big_smile
...times checked on external usb drive, with internal SATA should be better... wink

Ive been trying to explain that FreeArc will be soon able to UnECM (+ Join) data automatically after decompression - so the only thing user will need to do is click on "Decompress" to restore original .bin's, without further processing files...

ok, so it will be more convenient
but it's possible to make an frontend or a script for commandline applications - many do that now
there are quite a lot of programs (for cd recording, video/audio encoding/processing and so on)
that actually are only GUIs for command line *nix applications

unpack 5.2gb (8 files) from 380mb FA archive ~7 minutes
unpack 650mb from 350mb 7z ~80 seconds
unpack 8 x 650mb from 2gb 7z ... 8*80 = big_smile

but for your example it's 2nd case, right?
there wouldn't be 2gb archive, it would be 350mb.
so ImageDiff would have 5 minutes to complete in, and actually it shouldn't be much slower than joining files.
(it basically is joining of files)

edit:
so anyway such merged set (7z+ecm+ImageDiff+tak) seems very good idea to me
from example above it's 8 times compression improvement over PackIso
it would be rare of course to have so many versions merged
but generally it would still save a lot of time and space
(even single title decompression from this set on average shouldn't be slower than PackIso, i think,
because of speed gain from TAK and would TAK implement support for RAW audio data
sox.exe could be eliminated, as it's used now to add/remove RIFF header,
which means every audio track is basically copied after extraction,
which is about the same as doing ImageDiff)
kudos to cHrI8l3 for that

22 (edited by cHrI8l3 2009-04-04 18:33:45)

but for your example it's 2nd case, right?
there wouldn't be 2gb archive, it would be 350mb.
so ImageDiff would have 5 minutes to complete in, and actually it shouldn't be much slower than joining files.

I dont know if understanded your point correctly... example on which I based those decompression times is based on compression WITHOUT using ImageDiff (config "Arc 6" from base topic)

this is what that 380mb archive contains inside:
http://img12.imageshack.us/img12/269/clipboard05h.th.jpg

and those ~2gb 7z is total of 8 packiso archives ( 8*350mb )

ps. in a few days ill do merging test on game with audio tracks smile that may be interesting...

unpack 650mb from 350mb 7z ~80 seconds

is this FreeArc?

ok, so i guess, it's how fast single version from 378mb FreeArc would extract
could you test extraction of .ecm + one .imageDiff from 357mb .7z and 344mb .nz, please then
unecm would be the same, and ImagePatch would be slightly slower than joining, i guess

edit:
noooo, wait
ummm it says 7z

that's what i thought:
1st: FreeArc - prettyslow
2nd: 7z + ImageDiff - fast

i mean when you have files decoded from ImageDiff .7z you'd still beat FA by about 4-5 minutes
plus compression is better

Haldrie mentioned it is easier to get just one track instead of all, when something is compressed with packiso.
But if you need a single audio track it is easier to get the whole archive than figuring out how to properly turn the ape file into the bin. Decompressing into wav is easy, but removing the RIFF header requires knowledge of the sox.exe commands.

current stats:

"Sony PlayStation (2484) (2009-04-05 15-16-43).dat"
Records in DB total: 2484
Records with Audio : 774
------- Size -------
Total: 1298980571256
Data : 1169375041272
Audio: 129605529984

"Sega CD - Mega-CD (95) (2009-02-23 19-41-59).dat"
Records in DB total: 95
Records with Audio : 94
------- Size -------
Total: 43296314544
Data : 25283397888
Audio: 18012916656

audio makes only 10% of PSX data actually,
though it's off a little, since lately i submitted about 100 CDs without CDDA by selection
and others might be doing similar since CDDA dumping is such a drag
but anyway, for older consoles it's far more significant
so i guess extraction from PackIso might be slightly faster than from merged set on average for PSX

edit:
PackIso is locked @7z 4.53 which is unfortunate

4.54 beta      2007-09-04
-------------------------
- Decompression speed was increased.

4.58 beta      2008-05-05
-------------------------
- Some speed optimizations.

and it uses default settings, hence producing less compression,
but as i understand it compression mode for 7z does not significantly influence decompression speed
so i don't know - it's difficult to tell without measuring

edit:
i mean extraction by single title
whole merged set should decode a lot faster
but it's unlikely scenario, imho