Page 1 of 1

Keep the SORT order in output as input while removing the duplicates.

Posted: Wed Sep 21, 2016 11:59 am
by Satish Yadaw
Hi,

For a requirement I want to delete the duplicate records from the file but I do not wantto sort the records in output file.They should come in output with the same order as they are in the input.

Is it even possible in SORT, it be default sort the record always.

Re: Keep the SORT order in output as input while removing the duplicates.

Posted: Wed Sep 21, 2016 1:38 pm
by Chandan Yadav
I can think of a two step solution here

1. Remove the duplicates using your key and in INREC add the sequence number at the end and create output file with sequence number

2. Sort the output file from step 1 on sequence number and write the outpur file. Remove the sequence number in OUTREC

Thanks,
Chandan

Re: Keep the SORT order in output as input while removing the duplicates.

Posted: Fri Sep 23, 2016 10:07 pm
by William Collins
Don't sort the data twice when you don't even need to do it once.

Now, you may have to do one SORT, if your keys are not contiguous, but you want them so. And then you do have to do two SORT, as outlined above.

However, you also may want to deduplicate contiguous keys whilst leaving the same key value elsewhere untouched, in which case SORT is an extremely bad thing to do.

Assuming you don't need to SORT, how about WHEN=GROUP with KEYBEGIN and PUSH SEQ (long enough to cover maximum number of duplicates). Then OUTFIL INCLUDE= for "one" in the seq, and BUILD to drop off the sequence number.