1 (edited by V. 2013-01-09 20:31:37)

Howdy,

I am not sure if what I made is new, but searching around didn't turn up anything.
In any case, after not finding anything useful, I made a new crc tool to find (in a fast way) a matching crc in a dump with some offset.
This is whipped up in a few hours, so it is still kind of rough and not idiotproof (no software of mine is idiotproof actually  wink)

Usecase:
Finding an offset in a dump of specific audiotracks to match against the redump database.
If all tracks match with an offset to redump, it is further confirmation that the dump was successful.
This is NOT meant to actually find a drive offset or to be used in a way to matchup a dump to validate an entry!
Use the proper redump guides to dump discs! Different pressings of audio disks have factory offsets which this tool mainly detects.

README:

What:
 This is a not yet idiotproof version of findcrcs.
 It is to be used for finding a block of data which matches a specific crc.

How:
 findcrcs <file> <size of window> <crc> [more crcs...]

 File is a big file which should or may contain the searched for data.
 Size of window is the size of the block of data to find.
 Crc is the crc to find in the file (may be more then 1, but all will be matched on the window size).

 If a match is found it will print out an md5sum of the matched block for further inspection.
 For best results, add some (1MB or so) zero bytes padding around the file first.
 In a future version, this might be a selectable option of this program.

Why:
 Useful for finding audio offsets in disk images together with the redump.org database.

Warning:
 This software is not yet idiotproof!
 - It does not check arguments for validity yet (especially size of window and crc's.)
 - No paddiong option yet.
   if matching audiodata, you should pad the combined audiotracks with zero bytes at the start and end.

Compiling:
 Use "make" on any linux/unix/bsd console nearby, or if you must, an msys or cygwin environment.
 You need to use a relatively recent gcc (4.5.0+ ish I guess).
 This software uses crcutil-1.0 for providing fast crc calculations.
 crcutil is made by Andrew Kadatch and Bob Jenkins and can be found on http://code.google.com/p/crcutil/
 Do not contact them for support on findcrcs.
 The Makefile will try to pull in version 1.0 through wget if it is not supplied yet.

 Also, this program makes use of the MD5 implementation of Alexander Peslyak.
 This is found at http://openwall.info/wiki/people/solar/software/public-domain-source-code/md5
 A small casting patch was made to support g++, this small patch is released under the same license as the original md5.c file.

Contact:
 At the moment, see the redump.org forum thread where you got this.

-V.

Disclaimer:
I write my tools mainly for myself to use in a specific way.
If however someone has some issues using this, or has some suggestions, i MIGHT be able to change, fix or add things, but only if time and effort are permitting (which is not usually the case).

Source: http://winaoe.org/findcrcs-0.2.tar.gz
Win32 binary: http://winaoe.org/findcrcs-0.2-bin-win32.zip

2 (edited by Jackal 2013-01-09 10:27:04)

Hi,

thx for this useful tool. We already had the psxt001z --track option and there was another tool (by themabus?), but from what I recall those weren't really suitable for full images and only worked well on individual tracks.

V. wrote:

For the moment linux only, but I am considering a windows version (cli only)

This means your target audience for the moment will only be a fraction of what it could be. Maybe release a cygwin build (with the necessary dll files included) to get the windows folks going?

Regards

3 (edited by gaijin 2013-01-09 11:40:42)

psxt001z a bit outdated and slow, uses one core and step 4 bytes.

themabus' fff.exe  much better and uses multiple cores and step of 1 byte

+ he has interesting tool recombine -> one bad image + another version same bad image = good image or tracks.

4 (edited by V. 2013-01-09 21:00:31)

I was not aware of those 2 programs, so I tested them against mine.

First off, I ported the thing to windows/MinGW, so no need for cygwin dll's
Updated the initial post for a source and binary release of v.0.2.

Benchmarking the 3 programs was done on a combined bin (data + audio) of Moto Racer 1.
Image is 574,551,264 bytes big (around 75% of a full CD).
The target crc to find is 9c8f607e with a track size of 46,435,536 (track 6 of this listing: http://redump.org/disc/18266/)

first off: findcrcs

~/findcrcs-0.2$ time findcrcs.exe "Moto Racer 1.bin" 46435536 9c8f607e
348061372  9c8f607e  7ca7c0881f28f2684623b3a2ae53e95b

real    0m5.743s
user    0m0.000s
sys     0m0.047s

Found with the correct md5 on offset / index 348061372.
Meaning bytes 348061372 to 348061372+46435536 correspond to the track 6 in the dump information.

Done in 5.790 seconds.
In these 5.760 seconds, a total of 528,115,728 crcs were checked, making it do around 90,000,000 crcs per second.


Next up: psxt001z

This one was not doable for a complete search of the whole image, took waaay to long.
So instead I let it search -1000 to +1000 around index 348061372 (found above to be correct) to get at least some idea of the speed.

~/findcrcs-0.2$ time psxt001z.exe --track "Moto Racer 1.bin" 348060372 46435536 9c8f607e
psxt001z by Dremora, v0.21 beta 1

File: Moto Racer 1.bin
Start: 348060372
Size: 46435536
CRC-32: 9c8f607e

Offset correction 0 bytes, 0 samples, CRC-32 dbf4a270
Offset correction 0 bytes, 0 samples, CRC-32 dbf4a270
Offset correction 4 bytes, 1 samples, CRC-32 aaeba36a
Offset correction -4 bytes, -1 samples, CRC-32 4abb692b
...
...
Offset correction -992 bytes, -248 samples, CRC-32 a87ee8ab
Offset correction 996 bytes, 249 samples, CRC-32 3607c430
Offset correction -996 bytes, -249 samples, CRC-32 97e367cd
Offset correction 1000 bytes, 250 samples, CRC-32 9c8f607e

DONE!

Offset correction: 1000 bytes / 250 samples

real    2m27.556s
user    0m0.031s
sys     0m0.000s

This was a search of around 500 crcs (it does steps of 4, as also mentioned by gaijin).
Meaning, it does around 3.5 crcs per second.
findcrc's beats this with a factor of 25,000,000.


Lastly: fff
This is faster then psx001z, but still way too slow to do a full image.
So, again, it gets a 2000 byte window, with the default of a 4 byte step.
It can do less bytes per step, but 500 crcs gives us a clear compare to psxt001z.

~/findcrcs-0.2$ time fff.exe -offset=348060372 -size=46435536 -crc=0x9c8f607e "
Moto Racer 1.bin"
FindFileFragment @20100709 / themabus@inbox.lv
----------------------------------------------
Input: Moto Racer 1.bin
Offset: 348060372
Size: 46435536
CRC: 9c8f607e
Shift: both
Step: 4
Range: 20000

Offset correction 0 bytes, CRC-32 dbf4a270
Offset correction 4 bytes, CRC-32 aaeba36a
Offset correction -4 bytes, CRC-32 4abb692b
Offset correction 8 bytes, CRC-32 7ed53192
...
...
Offset correction -992 bytes, CRC-32 a87ee8ab
Offset correction 996 bytes, CRC-32 3607c430
Offset correction -996 bytes, CRC-32 97e367cd
Offset correction 1000 bytes, CRC-32 9c8f607e

Fragment found!

real    0m26.102s
user    0m0.015s
sys     0m0.015s

This was 500 crcs in 26.102 seconds, meaning around 19 crcs per second.
Meaning an increase over psx001z by a factor of around 5, but still 4,700,000 times slower then findcrcs.


So.... yeah....  wink


Anything above an offset of 10,000 is not really doable with fff and out of the question for psx001z.
10,000 is not even that much of an factory offset between different cd presses, so I think findcrcs has a use.

In any case, as said, I updated the initial post for a 0.2 update and windows binary release.
Enjoy.

Excellent tool, V.

Is there any chance of adding an option to output any found fragments to a new file(s)?
e.g.

findcrcs <file> <size of window> <crc> <outfile>

6

Thanks.
I used a bit of shellscript for that, but I guess that on windows it would be easier to have that be done by the tool itself.
I'll see what I can do in the next revision, i'll probably add a "slice" tool instead of having it be done by findcrcs itself.

Tested. This tool is really impressive.


This is the test I have performed:

I have extracted, using IsoBuster (no offset correction, no error detection for audio...), a full image of this audio disc I dumped in the past. Then, I have run findcrcs.exe "image.bin" 95020800 dd0e562e. After 10-11 seconds (I own a budget CPU), findcrcs has found two fragments with matching CRC32 (I assume due to hashes collisions), the second one with the expected MD5 hash.

G:\>findcrcs.exe "image.bin" 95020800 dd0e562e
120031802  dd0e562e  87ac7985fcc46286efc5c0876f723e5e
619521696  dd0e562e  8855ffc1921ec4e5d7536272ced3d989

On semi-vacation. MSF/AMSF to LBA/offset and viceversa calculator: link
To write properly occidental characters contained in japanese titles: screenshot
Spaces must be the fullwidth variant: link / screenshot

Thanks, V.

Agree with pablogm123: match finding is very fast.
smile

9 (edited by HwitVlf 2018-01-24 13:02:08)

This scanner works well enough that I made a simple GUI front-end for my own use. It uses info from the Redump database and extracts tracks that are found.
https://image.ibb.co/mbj1E6/GUI.png

In case it helps anyone else, it is HERE. Source code (Autoit) included.

EDIT Link Updated to v7

10 (edited by rosewood 2021-03-01 15:45:20)

I can't get your GUI to work, it always throws the error "Track Information is not formatted correctly."

So I compiled v3 of findcrcs for win64, you can download it here: findcrcs-0.3-bin-win64.7z
This new Version also supports extract from the command line:

Usage: findcrcs [OPTION]... [--] <FILE> <WINDOWSIZE> <CRC> [MD5] [CRC [MD5]...]

Find the offset of CRCs in FILE with a window size of WINDOWSIZE.
Outputs the crc, offset and md5 of a found segment.
If an MD5 is given, it will only output or extract on a matching md5 hash.

  -e              extract the found segments with the md5 hash as filename
  -f EXTRACTFILE  use EXTRACTFILE as file to extract to
                  implies -e and -q
  -p PADDING      use PADDING amount of zero bytes around the input file
                  this can result in a negative offset in the results
                  if used with -s only an end padding will be added
  -q              quit processing after finding a match and optionally
                  extracting that match
  -s SEEDFILE     get an initial crc from SEEDFILE
                  if used with -e, the SEEDFILE will be joined with the found
                  segment

11 (edited by HwitVlf 2018-01-24 13:01:06)

That's unfortunate.  EDIT SEE POST 15

I use findcrcs mostly to extract the split-tracks from a multi-track game that I dumped into a single image. Its easy to make a frontend to search for one track at a time from user input size/CRC, but it's miserable to enter all that information for a game with 40 tracks. So the question is how to automatically parse a game listing, automatically extracting size/crc, in a way that's compatible all the listings here. I've used my fromend for PS1 game listing extensively with no problems.

The upgrades in your v3 are quite useful.

12 (edited by HwitVlf 2018-01-24 12:58:21)

Anything worth doing is worth doing right, so I care to make this tool decent. But I'm also fairly busy, so there's  a small window when I'm willing to put effort into refining the GUI. If anyone cares to do some testing to iron out bugs, speak now or forever hold your peace.

EDIT See post 15 for GUI

@rosewood I tried your build on some larger ISOs. It rejects some large windows with a message, but runs on some ~4GB images and incorrectly lists no matching CRCs. It would be better to add a limit message to my GUI that have it erroneously report ' no match'. Do you know what the size cap for an accurate scan is?

I figured out what was wrong. It seems that it depends on the browser how the data is copied. IE11 - copies all values separated with a space and rows separated with a space and line break. Edge - copies all values/rows separated with a line break. Firefox - does it right.
Your program requires a fixed formatting (i.e. all values separated with a tab and all rows separated with a line break). Maybe you should implement a check for whitespace characters first before doing the scan.
For now I'll use Notepad++ to change the data into the correct format, e.g. replace ' \r\n' with '\r\n' and ' ' with '\t'.

By the way, I am neither the author nor did I change comething in the code of findcrcs. It just happened that the link to v2 was unavailable for some time, so I searched the www for the file and found the updated source on github. I couldn't get it to compile under Windows and in the end I used Ubuntu with Win64 target settings.

Thank you rosewood, very helpful info. I'm working on a fix for IE and testing other browsers. I implemented a variable detection in v4, but it will still fail with IE because of the extra space added to row-ends.

The frontend currently seems to work with track-info copied from Firefox, Midori and Opera v12.18.

15 (edited by HwitVlf 2018-01-24 13:05:08)

HERE is the final version of the GUI. I tested with every browser I could find and the only ones that bungled the formatting were Microsoft IE and Edge. I added a fix for IE, but I don't really care to fool with Edge when so many other browsers work properly. Microsoft products just aren't what they used to be.

As far as changes in the GUI, I tested on about 30 BINs from my garbage folder of bad dumps and fixed several bugs, some significant, some only triggering on rare random factors. Hopefully it'll be useful to someone. wink

The v0.3 findcrcs appears to have bugs (not the GUI but findcrcs itself). I confirmed that t doesn't detect CRCs in some files as it should. It will not detect Track1 of the game HERE , but v2 does.

Checking the archive of v7 I'm getting a virus / malware alert: Pynamer.A!ac!?

same here.

It's wise to be cautious. I did include the source code, but I doubt anyone cares or knows enough to go through it. If you open the source code and search for 'filewrite', you will find the only interaction the code has with the PC that can actually changes anything. This is the line that writes the extracted tracks to the hard drive. 'FileRead' will show you the only place it gathers any data from your PC; this is when it's reading the tracks from the source CD image.

This tool is written in Autoit language. Autoit doesn't compile the script into machine code, but stores it internally in an executable wrapper. Since all Autoit scripts share the same wrapper, it's fairly common for the language to produce false positives. You can read about it HERE. I have Comodo anti-virus, and for whatever reason, it alerts every now and then when I'm compiling one of my AutoIt scripts.

I know we live in an age when people can't really be trusted, but for what it's worth, I give my word that I wouldn't produce anything like a virus.

On a side note, after I made this front end, I found that someone else had already made a similar tool HERE. If you're still worried about the false-positive, it's a very good alternative. I prefer the smaller footprint of my version, but the built in tools in the other front-end are very nice too.

19 (edited by HwitVlf 2018-02-18 03:05:59)

I ran the frontend through VirusTotal and it was tagged as virus by about 30% of the scanners. With those results, I wouldn't even use it if I hadn't written it. The tool is insanely simple so I wondered what was causing so many alerts.

After some testing, it appears to be from a function that pauses/resumes findcrcs.exe. Normally a front end would wait for the tool to finish and process the resulting console-output.  But if findcrcs is scanning for a track that is all 00s, it will often return thousands of hits and take forever to finish. To overcome this long wait, my frontend monitors findcrcs output, when some results have been found, pauses findcrcs, tests the result's MD5, and it there is a match, terminates findcrcs, or if there is no match, resumes findcrcs and continues to wait.

This 'pause exe' function is built into Windows, you'll find it near the bottom of the source code as  'Func _ProcessSuspend()' and 'Func _ProcessResume()'. Apparently the virus scanners think using it means you're up to no good. I 'commented out' these two functions and VirusTotal went down to 2 no-name positives.

It's rather heavy handed for virus scanners to alert on anything using the pause function, but anyways, I'm submitting a false positive notice to a few of the bigger name companies. We'll see what happens.