logreview.sh is a script that is useful for manually reviewing logs from
popular web servers to quickly find abuse patterns. Use
Occam's razor first to find
the most likely troublemakers.
For a visual demo of how I use LogReview in context with my other tools, watch my BSides CT 2025 presentation How to fight DDoS attacks from the command line for a more complete picture of the process I use to thwart attacks.
It is still a work-in-progress (WIP), but it is functional enough that it helps at this point. More complete projects include goaccess and apachetop, but this is good in some ways too.
LogReview pairs well with my firewallblockgen scripts to identify larger patterns and block addresses in bulk. For automated handling, I recommend using the configuration I submitted as a PR for reaction with ipset instead of fail2ban.
LogReview lives in the https://github.com/TechnologyClassroom/LogReview/ repository.
LogReview helps to quickly find the most obvious problems with a web server. LogReview may not help when...
- ...botnets and vulnerability scanners intentionally try to "fly under the radar" of log analysis by rotating quickly through IPs. The top user-agent results in logreview.sh may still identify patterns here.
- ...user-agent are intentionally randomized. A more thorough review of all user-agents may help discover further patterns of abnormalities.
- ...large files are used to hog bandwidth. apachetop or GoAccess help here.
- ...slowloris attacks are used.
Place LogReview on a server the first time you are configuring LogReview.
git clone https://github.com/TechnologyClassroom/LogReview logreview
Change to the directory.
cd logreview
Copy the template configuration file the first time you are configuring LogReview.
cp -n logreview.conf.defaults logreview.conf
Change the config file to match your log the first time you are configuring LogReview. The config needs to point to the web server log file and specify which column the IPs are located.
editor logreview.conf
Run the script.
bash logreview.sh
The output should help get a glimpse of what is happening. Potentially block individual addresses with unwanted behavior or use tools to modify unwanted behavior.
Make sure you do not take action against automation that you are running on
your own site like the site monitoring processes that you may be running
yourself such as Uptime Kuma,
Prometheus, or Munin.
Place these addresses and or user-agents under the grepexclusion section of
logreview.sh to exclude known good results.
I would recommend taking notes on the research that you find. You will likely see patterns over time and you may forget the decisions that you previously made. Sometimes your decisions may change over time based on new information that is contrary to how a service presents itself.
After taking action or recognizing known behavior, you can add addresses to the
TMPBLOCK line to continue digging deeper.
-
The top 10 most frequent IP addresses hitting the server likely show what is slowing the server down.
-
If an IP address or user-agent is hitting your site at several orders of magnitude greater than anything else. Look at it first. If it claims to be a normal web browser user-agent, it is likely lying.
-
Bots that identify themselves
- Bots run by search engines are typically wanted by most sites. Blocking
them will affect your page rank or Seach Engine Optimization (SEO). I would
recommend not doing this for known good search engines such as Googlebot,
Applebot, bingbot, DuckDuckBot, SeznamBot, YandexBot, MojeekBot, Amazonbot,
yacybot, and Yahoo! Slurp unless they are specifically broken.
- There are exception though. Sometimes bots will identify as a search engine bot while not really having a functional search engine. This is one tactic used by companies building datasets for training generative AI models.
- Sometimes I have seen bad bots impersonating one of the major search engines by using their exact same user-agents so I would recommend avoiding the exclusion of their user-agents from results. Many of the major search engines provide documentation of verifying the authenticity of their bots.
- Common Crawl (CCBot) is an attempt at crawling the entire web collectively so that every single company crawling the web could stop and just download the latest Common Crawl archive. Having one bot hit your site would be better than having dozens hit your site right? Reality is a bit different, but the concept is nice. I allow it.
- Bots run by search engines are typically wanted by most sites. Blocking
them will affect your page rank or Seach Engine Optimization (SEO). I would
recommend not doing this for known good search engines such as Googlebot,
Applebot, bingbot, DuckDuckBot, SeznamBot, YandexBot, MojeekBot, Amazonbot,
yacybot, and Yahoo! Slurp unless they are specifically broken.
-
If a specific version of a web browser is the top user-agent by a significant margin and it is about 10 or more versions behind what you would expect, it is likely some time of automation. Acquire the list of IPs used by that user-agent on your server and process them with my ip-to-asn-info.sh script from FirewallBlockGen to find out more about where the requests are coming from. You will likely find that it is a bot that does not identify itself.