duplicate file finder

I have finally decided to sort out my photo files. They are scattered over 4 hard disks and I'm gathering them all into one directory. Theu likely contain many duplicates. I once had a simple dual file finder but its long gone. I'm looking for a free file finder but searches keep bringing up free downloads of files which only offer a free trial period followed by a compulsory purchase. I;m too scabby to lash out on a programme I will only use once. Any suggestions for WIndows 10?

Reply to
fred
Loading thread data ...

I used to use Visipics -- but that fails on larger photos, just ignores them.

Reasonably happy with:

formatting link
Free, complete.

Thomas Prufer

Reply to
Thomas Prufer

fred snipped-for-privacy@gmail.com posted

CCleaner version 5.63.7540 has one. It will match files by any combination of name, size, date or content, and can be set to ignore files under or over a threshold of your choice.

Reply to
Algernon Goss-Custard

Do these duplicate file finders just find duplicate files, or can you delete one or the other of say a pair, and if you do, will the file finder redirect anything that was pointing to the now-deleted file to the one remaining IYSWIM?

Reply to
Chris Hogg

that particular one is free.there is Master Seeker of course which looks for files, but I have often found that the main problem with duplicate finders is that they can have the same name but be a different file, and often the only way to know is from the length and date they were made, and in the end testing them. For you its visual, but most of my issues tend to be sound files. No it will delete, but really you have to be the person to add a file to your list. I tend to make a directory somewhere and copy the wanted ones in there then delete both from their oddball places. Brian

Reply to
Brian Gaff

Chris Hogg snipped-for-privacy@privacy.net posted

In the above version of CCleaner you can delete any of a group of identical files.

I'm not sure what you mean by "file finder". Do you mean, what happens when you click on a shortcut pointing to the now-deleted file? If so, on my version of Windows it invokes the "Missing Shortcut" routine to find the file, and will eventually find its alter ego (assuming it's got the same name).

Reply to
Algernon Goss-Custard

Just downloaded, spent ages scanning files, started making files to be deleted. Selected Delete, message "delete up to 15 files free for more you must buy the full version"!!!

Mike

Reply to
Mike Rogers

My experience with Easy Duplicate Finder (goodness knows who makes it - there's no Help | About menu option) is that you can configure what attributes it uses when deciding whether duplicate names are the same file. Timestamp and file size are usually good enough. I *think* it may have the option to generate a hash key from the file contents to give greater confidence that it's the same file, but that slows down comparison considerably, even if it's only applied to files with the same name.

That reminds me. I need to do a duplicate file check on my PC. I copied various files/folders from a dead laptop (dead motherboard) some time ago. Recently I needed that dead PC's hard disk (*) so I did a quick blanket copy of everything onto my main PC: now I need to go through and weed out the duplicates and the unwanted dross. It is tempting to say that anything that I didn't already have before the recent copy is probably unwanted and not to be missed, but I'm being cautious!

(*) I thought I had quite a few 2.5" SATA drives that I've removed from dead laptops over the years, but when I came to look for them, all I could find was the one from my dead HP laptop, hence the need to save its contents just-in-case.

Reply to
NY

Trying that. Slow deleting one image at a time but I'm probably doing something wrong but so far I'm happy Thanks for the tip

Reply to
fred

The photo compare programs actually compare the visual contents, i.e. Visipics would find a series of photos taken with a self-exposure, with a little bit of movement of the people in the picture, between pictures, IYSWIM.

AwesomePhotoFinder also does so, I think. I haven't run into the issue with me having to pay, and the program says it is free on its site... don't know what's going on there.

Thomas Prufer

Reply to
Thomas Prufer

hashdeep64.exe would be a way of generating a hash for an entire tree. Then you have date, filesize, MD5 to work with.

formatting link
formatting link
# source only

2014: V4.4

formatting link
md5deep-4.4.zip 3.62 MB <=== link is inconvenient to copy, use link on page

Name: md5deep-4.4.zip Size: 3,792,436 bytes (3703 KiB) SHA256: D5E85933E74E5BA6A73F67346BC2E765075D26949C831A428166C92772F67DBC

You can unpack one of the EXE from the ZIP for usage. hashdeep64.exe on a 64 bit machine perhaps.

*******

# Task is to compare C:\Downloads to L:\Downloads

# First, prepare a filelist of C:\Downloads # Executable happens to be on L: at the moment.

L:\hashdeep64.exe -h $ reference to instructions

cd /d c:\ # Set the working dir

L:\hashdeep64.exe -j 1 -c md5 -r Downloads > L:\audit.txt # Generate a filelist with hashes, single threaded # Single threaded -j 1 is suitable for HDD. # With an SSD, -j 8 job goes faster # But single thread has better filename ordering. # -r C:\ for whole partition...

The output looks like this. All the zero-sized files should have the same MD5 value.

%%%% size,md5,filename

1908,d1e75542ec8d1b4851765a57ac63618e,C:\$WINDOWS.~BT\Sources\Panther\diagerr.xml 5375,b8b50a45c5d1a80862be54cc7e8765a9,C:\$WINDOWS.~BT\Sources\Panther\diagwrn.xml 7534,46b37bdb08213e86a466942707cdfcb4,C:\$WINDOWS.~BT\Sources\Panther\setupact.log 0,d41d8cd98f00b204e9800998ecf8427e,C:\$WINDOWS.~BT\Sources\Panther\setuperr.log 5411,3bb75bebcf1ba4ca264e07be4f065b75,C:\$Windows.~WS\Sources\Panther\diagerr.xml

Now, we visit the second directory needing a comparison run. L: happens to be better quality storage, so we can try -j 8 speedup. If the two "trees" do not have similar structures, it's possible audit.txt might need some chopping and edits. (It *does* work without editing, but the log will show "moved to..." messages for everything.)

cd /d L:\

hashdeep64.exe -j 8 -c md5 -k audit.txt -a -v -v -r L:\Downloads > audit-out.txt

With only one -v in the command, output looks like this. And only by editing audit.txt severely, to match the format of the second tree exactly. On purpose I corrupted one of the files, and it did not record that as such. Presumably the changed date on the file, prevents it from being detected as a corrupt-original.

hashdeep64.exe: Audit failed Files matched: 1624 Files partially matched: 0 Files moved: 0 New files found: 12 Known files not found: 12

On Windows, there are a number of pesky file types on C: . The C: tree has sockets in the SChannel area. A hashdeep run would stop dead, if a thread happened to open a socket for read access, as the socket is a stream thing, and bytes only come out, if you pump bytes in. Windows should not be leaving these just sitting in a directory.

But version 4.4 no longer has parameters for avoiding particular file system traps, such as junction points or sockets. So perhaps the program knows about the pesky bits and you no longer have to lard up the command with yet more letters.

Run from an Administrator Command Prompt, so file access is "less" of a problem. There are always some items not accessible to runs like this.

On Windows 10, disable Windows Defender real-time scanning and the program will go about 8x faster.

It can go screaming fast on an SSD, but not on a HDD. It probably took about

20 minutes to do all of C: as a test. It took about five seconds to verify the Downloads folder (8GB worth).

Paul

Reply to
Paul

Name, size, hash, content-summary

The hash (like MD5 or SHA1) makes some of the fields to the left of it, unnecessary. Two things with the same hash, are identical.

Doing a content-summary (fluffy white cat, fluffy white cat photographed from 45 degree angle), is a lot harder to do. There are robust math methods (that stink), heuristic methods (weak), or AI methods (???).

Comparing pictures, is a bit like the OCR problem. It might work some of the time, but it requires a human to make the tough calls. Noise in the methods, means a decision threshold is involved, of "how close" do the images have to be, to be counted as "the same".

The following one, makes no attempt to measure pictures which are a matrix transform of one another. A matrix transform is scale, rotate, translate. Well, this method makes a mess of comparing such pictures. But this method might work on two pictures at different resolutions. The output of the tool happens to be base64 (to save space), but then that makes it harder to eyeball the outputs. The "fp" in the call here, stands for FingerPrint.

findimagedupes -v=fp TranscodedWallpaper Picture111.jpg

/z8ffw7/Hv6e/h78jvqOPo46DhIPEA8QDxAPEA8Qzwk= TranscodedWallpaper /38ffw7/Hv6e/h78jvqOPo46DhIPEA8QDxAPEA8Qzwk= Picture111.jpg

echo /38ffw7/Hv6e/h78jvqOPo46DhIPEA8QDxAPEA8Qzwk= | base64 -d > 32bytesofbinary

Doing an XOR(a,b) of the 32bytesofbinary for each image, and counting the logic 1's, is an indication of how similar they are. The XOR(a,b) would be all zeros on a perfect match, so the count of logic 1's would be zero.

One of those pictures, is a camera shot of the computer screen while the other picture is on display.

I would not want the computer automatically throwing one of those pictures away, based on that method. But I could pop up the two pictures on the screen, and ask for human intervention.

Summary: Using hashes makes it relatively easy to locate and remove absolutely identical pictures. Finding "similar" pictures is a lot harder to do, reliably.

Paul

Reply to
Paul

Depends on what you consider to be a duplicate...?

Do you mean same name, date and checksum, or just visually the same, might be different name etc?

If all the files are in a folder, then turning on large thumbnail view will show each icon as a preview of the picture. That should make spotting visual duplicates easier.

Software like Adobe Bridge I think is still free and does not require a cc subscription. That will catalog a large number of folders and show you previews etc.

This is free for personal use:

formatting link

Reply to
John Rumm

HomeOwnersHub website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.