Page 1 of 1

SUM FIELDS=NONE in sort.

Posted: Mon Nov 09, 2015 1:30 pm
by hp123
Hi,

To remove duplicates from a file we can use SUM FIELDS=NONE in sort. But I don't understand it, why it removes dulicates? If I'll provide some values for SUM FIELDS, it will sum the number in that place then how it removes duplicates with NONE? Can anyone please explain this concept?

Re: SUM FIELDS=NONE in sort.

Posted: Mon Nov 09, 2015 1:49 pm
by Anithab
Please check this link about SUM function
http://publibz.boulder.ibm.com/cgi-bin/ ... 0615185603

Re: SUM FIELDS=NONE in sort.

Posted: Mon Nov 09, 2015 4:37 pm
by William Collins
If you specify one or more fields on the SUM statement, for records with non-unique keys (from the SORT statement) those fields are summed, and one single record is output with the sums. The first record to arrive at the SUM is the one which is used as the base for the output record.

If you specify FIELDS=NONE, the actually summing is not done, and with a non-unique key the first record of each key which arrives at the SUM processing is the one which is written out.

If your data is already in sequence by the key, then using SORT and SUM is a very inefficient way of removing duplicate keys.

Re: SUM FIELDS=NONE in sort.

Posted: Sat Nov 21, 2015 9:43 am
by chaat
William,

if you are positive that the data is in sequence already, then you can use

MERGE FIELDS=(.....)
SUM FIELDS=NONE

this way you can avoid the overhead of sorting the data again.

Re: SUM FIELDS=NONE in sort.

Posted: Sat Nov 21, 2015 3:04 pm
by William Collins
Yes, Chuck.

It has to be with a single file, SORTIN01 replacing SORTIN. Use of EQUALS if required to preserve order (if "base" record details are significant and the first record is wanted for the base).

The OUTFIL reporting features can also be used, REMOVECC,NODETAIL and SECTIONS with TRAILER3 (based on last record) or HEADER3 (based on first record).

The point was in the inefficiency of using SORT just to allow the SUM.

There's also ICETOOL's SELECT operator which provides further options (like keeping the discards, only keeping unique/non-unique keys, keeping the first n of a set of keys, etc).

Re: SUM FIELDS=NONE in sort.

Posted: Fri Dec 04, 2015 11:24 am
by hp123
William Collins wrote:If you specify FIELDS=NONE, the actually summing is not done, and with a non-unique key the first record of each key which arrives at the SUM processing is the one which is written out.
If actual summing is not done then what exactly happens? Does SORT "ignore" to sum when it sees the keyword NONE? Or it calls some another routine, as a "program"?

Re: SUM FIELDS=NONE in sort.

Posted: Fri Dec 04, 2015 11:28 am
by hp123
Read the above posts again and have some questions:
William Collins wrote:If your data is already in sequence by the key, then using SORT and SUM is a very inefficient way of removing duplicate keys.
William Collins wrote:There's also ICETOOL's SELECT operator which provides further options (like keeping the discards, only keeping unique/non-unique keys, keeping the first n of a set of keys, etc).
So to remove duplicates I should prefer to use ICETOOL's SELECT operator?

Re: SUM FIELDS=NONE in sort.

Posted: Fri Dec 04, 2015 3:58 pm
by William Collins
As part of a SUM operation there is the dropping of duplicate keys. FIELDS=NONE on SUM just does that part, as there is nothing to SUM. If you specify something to SUM, it is summed, and the other duplicates of the first record are still dropped.

If you need something which SELECT provides easily which is not provided easily elsewhere, then SELECT is a sensible choice.