Rejection log sorting script
Version up on Oct 27, 2014 (the former version is here.)
Here is a shell script useful for finding out mail servers which are mistakenly rejected by the S25R anti-spam system. If your computer combines a web server with a mail server, you can easily watch rejection records with a web browser by means of installing this script together with a password under a directory under the cgi-bin directory. You can also run it as a command.
Function
This script inputs a Postfix mail log and extracts records of rejection with the response code "4XX" (meaning "try again later"; "XX" is a couple of numerals) by client restriction (rejections by other reasons than S25R are also extracted), and displays them sorted so that the records of retry accesses are arranged in a sequence. That is, accesses with the same client IP address, sender address and recipient address are arranged in a sequence. If all of them are not the same, the records are separated by a blank line with each other.
The reason codes for the rejections are displayed with the following characters:
- C: Client host rejected (by S25R)
- B: Client host [...] blocked (by DNSBL; in case you set maps_rbl_reject_code as 4XX and use reject_rbl_client)
- S: Sender address rejected (in case you use reject_unknown_sender_domain)
- R: Recipient address rejected (in case you use reject_unknown_recipient_domain)
- H: Helo command rejected (in case you use reject_unknown_helo_hostname)
- O: Other
It also displays the following data at the end:
- access count: Indicates the total number of extracted accesses.
- estimated message count: Indicates the number of messages in counting accesses with the same client IP address and recipient address as one message. It is supposed to be near to the number of spam messages which might have been received if you didn't take the anti-spam measure. The reason why it is counted without considering the difference of the sender address is that one spam message may retry changing its sender address.
- retry sequence count: Indicates the number of sequences of retry accesses. When it is not 0, you should investigate retry accesses.
Practical use
Legitimate mail servers always retry transfer at appropriate intervals against rejection with the response code "4XX". The records of those rejections are displayed in a sequence with this script. Therefore, it will help you to find out accesses from legitimate mail servers which should be listed in the white list. (Note that you cannot receive blocked messages by S25R whitelisting unless the reason code is "C" or "B".)
If accesses displayed in a sequence satisfy all of the following conditions, the client may be a legitimate mail server which should be listed in the white list.
- The retry intervals are 1 minute or longer but up to 4 hours. (However, when one sender sends multiple messages to one recipient, the apparent intervals may be short.)
- The retry has been lasting for 30 minutes or more.
- The recipient address is correct.
Meanwhile, if accesses displayed in a sequence fall under any of the following conditions, the client is probably or surely illegitimate.
- The retry intervals are less than 1 minute.
- The retry didn't last for 30 minutes.
- The HELO address is your mail server's IP address or the recipient's domain name.
- The HELO address is an IP address but it is not enclosed by brackets ( [ ] ).
- The HELO address changes.
In case you are hard to decide whether the client is legitimate or not, you should once list it on the white list, and then unlist it if the recipient complains of spam.
Necessary configuration
You need to configure the access mode of the mail log files so that they are readable with the authority of the HTTP daemon. In many systems, you can configure it with the commands as follows:
-
chgrp nobody /var/log/maillog*
chmod g+r /var/log/maillog*
In case the user/group name of your HTTP daemon is "daemon", type "daemon" instead of "nobody" described above.
Alterations
- Mail log file names
This script is coded to input the mail log files in the last week and this week applying to my system environment in which the mail log files are rotated every week. If necessary, alter the mail log file names in the script according to your system environment or the period of time you want to watch.
- To suppress single access records
A record of an access which has not been retried is displayed in a single line separated by a blank line with each other. It may be sometimes helpful for you to find the client to be illegitimate even if it has apparently retried transfer.
However, if too many single access records disturb your watch, you can suppress them. In the GAWK script which is the 11th process, rewrite the line:
-
Suppress_single_access_records=0
into:
-
Suppress_single_access_records=1
Even if you suppress single access records, the indication of the access count, estimated message count and retry sequence count is the same.
(You can freely alter this script and publish your alteration.)
Shell script code
(Plain text version; long lines are folded)
#!/bin/sh
echo "Content-Type: text/plain"
echo
echo "Mail rejection log (4XX) - sorted;" \
"Reason code: C=Client, B=DNSBL, S=Sender, R=Recipient, H=HELO, O=Other"
echo
#
# (1) Input mail log.
#
cat /var/log/maillog.1 /var/log/maillog | \
#
# (2) Extract records indicating "4XX".
#
egrep 'reject: RCPT from [^]]+\]: 4[0-9][0-9] ' | \
#
# (3) Extract essential items.
#
gawk '
{
client=substr($0, match($0, /from [^]]+\]/)+5, RLENGTH-5)
sub(/\[/, " [", client)
sender=substr($0, match($0, /from=<[^>]*>/), RLENGTH)
rcpt=substr($0, match($0, /to=<[^>]*>/), RLENGTH)
helo=substr($0, match($0, /helo=<[^>]*>/), RLENGTH)
if (match($0, /Client host rejected/))
reason="C"
else if (match($0, /Client host .+ blocked/))
reason="B"
else if (match($0, /Sender address rejected/))
reason="S"
else if (match($0, /Recipient address rejected/))
reason="R"
else if (match($0, /Helo command rejected/))
reason="H"
else
reason="O"
printf "%s %2d %s %s %s %s %s %s\n", \
$1, $2, $3, reason, client, sender, rcpt, helo
}
' | \
#
# (4) Convert month names into month numbers.
#
gawk '
BEGIN {
month_num["Jan"]=1
month_num["Feb"]=2
month_num["Mar"]=3
month_num["Apr"]=4
month_num["May"]=5
month_num["Jun"]=6
month_num["Jul"]=7
month_num["Aug"]=8
month_num["Sep"]=9
month_num["Oct"]=10
month_num["Nov"]=11
month_num["Dec"]=12
max_month_num=0
}
{
$1=month_num[$1]
if ($1>max_month_num)
max_month_num=$1
else if ($1<max_month_num)
$1+=12
printf "%3d %2d %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9
}
' | \
#
# (5) Sort according to IP address, sender address and recipient address.
#
sort -k 6,8 | \
#
# (6) Insert a blank line between records with a different triplet.
#
gawk '
BEGIN {
prev_triplet=""
}
{
if (prev_triplet!="") {
if (prev_triplet!=$6 $7 $8)
print ""
}
print
prev_triplet=$6 $7 $8
}
' | \
#
# (7) Convert retry records in a sequence into one line.
#
gawk '
BEGIN {
RS=""
}
{
gsub(/\n/, "\036")
print
}
' | \
#
# (8) Sort according to date and time.
#
sort -k 1,3 | \
#
# (9) Reconvert retry records in a sequence into multiple lines.
#
gawk '
{
gsub(/\036/, "\n")
print
print ""
}
' | \
#
# (10) Reconvert month numbers into month names.
#
gawk '
BEGIN {
month_name[1]="Jan"
month_name[2]="Feb"
month_name[3]="Mar"
month_name[4]="Apr"
month_name[5]="May"
month_name[6]="Jun"
month_name[7]="Jul"
month_name[8]="Aug"
month_name[9]="Sep"
month_name[10]="Oct"
month_name[11]="Nov"
month_name[12]="Dec"
}
{
if ($0!="") {
$1=month_name[($1-1)%12+1]
printf "%s %2d %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9
}
else
print ""
}
' | \
#
# (11) Output sorted records with counting.
#
gawk '
BEGIN {
Suppress_single_access_records=0
RS=""
acc_count=0
host_and_rcpt=""
msg_count=0
seq_count=0
}
{
retry_count=gsub(/\n/, "\n")
acc_count+=1+retry_count
if (index(host_and_rcpt, $6 $8)==0) {
++msg_count
host_and_rcpt=$6 $8 host_and_rcpt
}
if (retry_count>0)
++seq_count
if (!(retry_count==0 && Suppress_single_access_records)) {
print
print ""
}
}
END {
print "access count =", acc_count, \
", estimated message count =", msg_count, \
", retry sequence count =", seq_count
}
'
|
(HTML <PRE> tag verision; long lines should be horizontally scrolled)
#!/bin/sh
echo "Content-Type: text/html"
echo
echo "<html><body><pre>"
echo "Mail rejection log (4XX) - sorted;" \
"Reason code: C=Client, B=DNSBL, S=Sender, R=Recipient, H=HELO, O=Other"
echo
#
# (1) Input mail log.
#
cat /var/log/maillog.1 /var/log/maillog | \
#
# (2) Extract records indicating "4XX".
#
egrep 'reject: RCPT from [^]]+\]: 4[0-9][0-9] ' | \
#
# (3) Extract essential items.
#
gawk '
{
client=substr($0, match($0, /from [^]]+\]/)+5, RLENGTH-5)
sub(/\[/, " [", client)
sender=substr($0, match($0, /from=<[^>]*>/), RLENGTH)
rcpt=substr($0, match($0, /to=<[^>]*>/), RLENGTH)
helo=substr($0, match($0, /helo=<[^>]*>/), RLENGTH)
if (match($0, /Client host rejected/))
reason="C"
else if (match($0, /Client host .+ blocked/))
reason="B"
else if (match($0, /Sender address rejected/))
reason="S"
else if (match($0, /Recipient address rejected/))
reason="R"
else if (match($0, /Helo command rejected/))
reason="H"
else
reason="O"
printf "%s %2d %s %s %s %s %s %s\n", \
$1, $2, $3, reason, client, sender, rcpt, helo
}
' | \
#
# (4) Convert month names into month numbers.
#
gawk '
BEGIN {
month_num["Jan"]=1
month_num["Feb"]=2
month_num["Mar"]=3
month_num["Apr"]=4
month_num["May"]=5
month_num["Jun"]=6
month_num["Jul"]=7
month_num["Aug"]=8
month_num["Sep"]=9
month_num["Oct"]=10
month_num["Nov"]=11
month_num["Dec"]=12
max_month_num=0
}
{
$1=month_num[$1]
if ($1>max_month_num)
max_month_num=$1
else if ($1<max_month_num)
$1+=12
printf "%3d %2d %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9
}
' | \
#
# (5) Sort according to IP address, sender address and recipient address.
#
sort -k 6,8 | \
#
# (6) Insert a blank line between records with a different triplet.
#
gawk '
BEGIN {
prev_triplet=""
}
{
if (prev_triplet!="") {
if (prev_triplet!=$6 $7 $8)
print ""
}
print
prev_triplet=$6 $7 $8
}
' | \
#
# (7) Convert retry records in a sequence into one line.
#
gawk '
BEGIN {
RS=""
}
{
gsub(/\n/, "\036")
print
}
' | \
#
# (8) Sort according to date and time.
#
sort -k 1,3 | \
#
# (9) Reconvert retry records in a sequence into multiple lines.
#
gawk '
{
gsub(/\036/, "\n")
print
print ""
}
' | \
#
# (10) Reconvert month numbers into month names.
#
gawk '
BEGIN {
month_name[1]="Jan"
month_name[2]="Feb"
month_name[3]="Mar"
month_name[4]="Apr"
month_name[5]="May"
month_name[6]="Jun"
month_name[7]="Jul"
month_name[8]="Aug"
month_name[9]="Sep"
month_name[10]="Oct"
month_name[11]="Nov"
month_name[12]="Dec"
}
{
if ($0!="") {
$1=month_name[($1-1)%12+1]
printf "%s %2d %s %s %s %s %s %s %s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9
}
else
print ""
}
' | \
#
# (11) Output sorted records with counting.
#
gawk '
BEGIN {
Suppress_single_access_records=0
RS=""
acc_count=0
host_and_rcpt=""
msg_count=0
seq_count=0
}
{
retry_count=gsub(/\n/, "\n")
acc_count+=1+retry_count
if (index(host_and_rcpt, $6 $8)==0) {
++msg_count
host_and_rcpt=$6 $8 host_and_rcpt
}
if (retry_count>0)
++seq_count
if (!(retry_count==0 && Suppress_single_access_records)) {
print
print ""
}
}
END {
print "access count =", acc_count, \
", estimated message count =", msg_count, \
", retry sequence count =", seq_count
}
' | \
#
# (12) Convert "<" and ">" into entities.
#
gawk '
{
gsub(/</, "\\<")
gsub(/>/, "\\>")
print
}
'
echo "</pre></body></html>"
|