Mainframe, MVS and zOS Discussion

Posted: **Mon Nov 09, 2015 1:30 pm**

Hi,

To remove duplicates from a file we can use SUM FIELDS=NONE in sort. But I don't understand it, why it removes dulicates? If I'll provide some values for SUM FIELDS, it will sum the number in that place then how it removes duplicates with NONE? Can anyone please explain this concept?

Posted: **Mon Nov 09, 2015 1:49 pm**

Please check this link about SUM function
http://publibz.boulder.ibm.com/cgi-bin/ ... 0615185603

Posted: **Mon Nov 09, 2015 4:37 pm**

If you specify one or more fields on the SUM statement, for records with non-unique keys (from the SORT statement) those fields are summed, and one single record is output with the sums. The first record to arrive at the SUM is the one which is used as the base for the output record.

If you specify FIELDS=NONE, the actually summing is not done, and with a non-unique key the first record of each key which arrives at the SUM processing is the one which is written out.

If your data is already in sequence by the key, then using SORT and SUM is a very inefficient way of removing duplicate keys.

Posted: **Sat Nov 21, 2015 9:43 am**

William,

if you are positive that the data is in sequence already, then you can use

MERGE FIELDS=(.....)
SUM FIELDS=NONE

this way you can avoid the overhead of sorting the data again.

Posted: **Sat Nov 21, 2015 3:04 pm**

Yes, Chuck.

It has to be with a single file, SORTIN01 replacing SORTIN. Use of EQUALS if required to preserve order (if "base" record details are significant and the first record is wanted for the base).

The OUTFIL reporting features can also be used, REMOVECC,NODETAIL and SECTIONS with TRAILER3 (based on last record) or HEADER3 (based on first record).

The point was in the inefficiency of using SORT just to allow the SUM.

There's also ICETOOL's SELECT operator which provides further options (like keeping the discards, only keeping unique/non-unique keys, keeping the first n of a set of keys, etc).

Posted: **Fri Dec 04, 2015 11:24 am**

William Collins wrote:If you specify FIELDS=NONE, the actually summing is not done, and with a non-unique key the first record of each key which arrives at the SUM processing is the one which is written out.

If actual summing is not done then what exactly happens? Does SORT "ignore" to sum when it sees the keyword NONE? Or it calls some another routine, as a "program"?

Posted: **Fri Dec 04, 2015 11:28 am**

Read the above posts again and have some questions:

William Collins wrote:If your data is already in sequence by the key, then using SORT and SUM is a very inefficient way of removing duplicate keys.

William Collins wrote:There's also ICETOOL's SELECT operator which provides further options (like keeping the discards, only keeping unique/non-unique keys, keeping the first n of a set of keys, etc).

So to remove duplicates I should prefer to use ICETOOL's SELECT operator?

Posted: **Fri Dec 04, 2015 3:58 pm**

As part of a SUM operation there is the dropping of duplicate keys. FIELDS=NONE on SUM just does that part, as there is nothing to SUM. If you specify something to SUM, it is summed, and the other duplicates of the first record are still dropped.

If you need something which SELECT provides easily which is not provided easily elsewhere, then SELECT is a sensible choice.

Mainframe, MVS and zOS Discussion

SUM FIELDS=NONE in sort.

SUM FIELDS=NONE in sort.

Re: SUM FIELDS=NONE in sort.

Re: SUM FIELDS=NONE in sort.

Re: SUM FIELDS=NONE in sort.

Re: SUM FIELDS=NONE in sort.

Re: SUM FIELDS=NONE in sort.

Re: SUM FIELDS=NONE in sort.

Re: SUM FIELDS=NONE in sort.