Could you please give a bit more detail on this, as in define "clean" as you use it here?
rebuild_zebra.pl works in two stages: (1) export all/queued records to a file; (2) gives the exported file to Zebra indexer (zebraidx command). -nosanitize option modify the first stage. Without this option, during stage 1, records are 'sanitized' before being outputted in the file, ie their leader is fixed, biblionumber is checked, UNIMARC tag 100 is forced to UTF-8, and few other things. This 'sanitizing' requires to read records, parse them into a Perl object, manipulate the object, and finally format it back into XML. This consumes CPU/memory resource, and take time. With -nosanitize option, records are read from MySQL, and directly written in the export file. It decreases drastically the time rebuild_zebra.pl spend in stage 1. In this perspective, a 'clean' record is a record which doesn't need to be sanitized: leader ok, correct record id, etc. By the way, coming back to the initial question, it could be interesting also to improve performance of stage 2, so improving Zebra index raw performances.