Removing a large quantity of duplicates from a database

Question asked by Jack Brooker on May 22, 2018
We're currently in the process of moving our data from a basic windows explorere based system to PDM, and i'm going through the process of sorting our data. Unfortunately, our data is pretty crap to be quite blunt, and around 60-70% of it is duplicate parts.




The image shows some of our standard part files. If we didn't have that many, there wouldn't be much of an issue with me manually going through it all and removing each duplicate part, but unfortunately this trend continues throughout the entire lot. The main problem isn't that there are vast duplicates of a single part, but rather low quantities of duplicates for seemingly every single part in the folder.

I'm looking for a way to remove these, I don't have much of a problem writing a script to do the job but i've had problems with running scripts inside PDM before, and i'm not hugely experienced in it. Is there an easier way of doing this which I dont know of within PDM, or possibly a program designed to do this job for me? Keep in mind these parts are spread throughout multiple folders in multiple levels, with little to no structure to it whatsoever


I've had a look at the Duplicate files thread here as well. It covers re-referencing files which i'm already happy with, I just need to remove the duplicates so I can check in my standard parts and begin re-referencing