So, that's the jumping off point for this article's scripts: analyzing log files to understand what's going on and why.
To start, a handy check is to see how many processes are running, because my DDOS was characterized by a ridiculous number of comment and search scripts being triggered—hundreds a minute. How to check?
The
ps
command offers a list of running processes at any given time, but for many versions, all you see is the Web server "httpd" without any further details. The -C cmd
flag narrows down output only to those processes, like this:
: ps -C httpd
PID TTY TIME CMD
20225 ? 00:13:21 httpd
28162 ? 00:00:01 httpd
...
5681 ? 00:00:00 httpd
5683 ? 00:00:00 httpd <defunct>
]]>
(Note the "defunct" process that's about to vanish.) So one easy test is to see how many httpd processes are running:
$ ps -C httpd | wc -l
108
That seems like a lot, but this server is hosting several sites, including the super-busy AskDaveTaylor.com tech-support site, which sees more than 100k hits/day. So how does this vary over time? Hmm...still working on the command line:
$ while /bin/true
> do
> ps -C httpd | wc -l
> sleep 5
> done
108
107
103
99
94
91
87
84
91
121
120
116
So there's a max of 121 and a min of 87. But, what if I actually want to analyze this and get min, max and average over a longer period of time? Here's how I solve it:
#!/bin/sh
# Calculates the number of processes running that matches
# a set pattern over time, producing min, max and average.
min=999; max=0; average=0; tally=0; sumtotal=0
pattern="httpd" # ps -C pattern
while /bin/true
do
count=$(ps -C $pattern | wc -l)
tally=$(( $tally + 1 ))
if [ $count -gt $max ] ; then
max=$count
fi
if [ $count -lt $min ] ; then
min=$count
fi
sumtotal=$(( $sumtotal + $count ))
average=$(( $sumtotal / $tally ))
echo "Current ps count=$count: min=$min, max=$max, tally=$tally
↪and average=$average"
sleep 5 # seconds
done
exit 0
Notice in the script that I'm not falling into the trap of calculating the average by having a running average and somehow factoring in the latest value as a diminishing additive, but instead I use a sumtotal
variable that keeps having the latest processor count added. That divided by tally
is always the average, although at some point this probably would be greater than MAXINT (2**32) and would start to produce bad results. On a modern computer, however, that should take a while. (And the quantum, the period of time between iterations, also can be adjusted. Five seconds might be too granular for a process that's going to be run for hours or even days.) The following are the first few lines of output. Notice how the
min
and max
vary as the different values are calculated:
sh processes.sh
Current ps count=132: min=132, max=132, tally=1 and average=132
Current ps count=128: min=128, max=132, tally=2 and average=130
Current ps count=124: min=124, max=132, tally=3 and average=128
Current ps count=123: min=123, max=132, tally=4 and average=126
If I let the script run for a longer period of time, the values become a bit more varied:
Current ps count=90: min=76, max=150, tally=70 and average=107
During the 15 minutes or so that I ran the script, an average of 107 "httpd" processes were running, with a minimum of 76 and a max of 150. Armed with that information, another script could keep an eye on things via a cron job, like this:
#!/bin/sh
# DDOS - keep an eye on process count to
# detect a blossoming DDOS attack
pattern="httpd"
max=200 # avoid false positives
admin="d1taylor@gmail.com"
count="$(ps -C $pattern | wc -l)"
if [ $count -gt $max ] ; then
echo "Warning: DDOS in process? Current httpd count =
↪$count" | sendmail $admin
fi
exit 0
That's a superficial solution, however, and it has two problems: 1) what I'd really like is to be able to identify the potential DDOS based on processor count and watch to see if it's sustained over the next few invocations of the script, and 2) once it's triggered, if it is a DDOS, in addition to everything else, I'll also start drowning in e-mail from this script saying essentially the same thing each time. Not good. What the script needs is contextual memory so it can differentiate between a sudden spike in traffic and a persistent DDOS attack. In the former case, the script might trigger positive, then the next time it runs, it's all within acceptable limits again. In the latter case, once the attack starts, it'll probably just accelerate.
That's the opposite of the e-mail non-repeat condition though, because in the latter case, I want to know that the e-mail has been sent and not send it again within, say, a 60-minute window.
I'll dig in to both of those situations another time. For now, I need to get back to my server and keep bringing things back on-line, program by program, to try to avoid any problems. Stay tuned!
Source
0 comments:
Post a Comment