LogReview

logreview.sh is a script that is useful for manually reviewing logs from popular web servers to quickly find abuse patterns. Use Occam's razor first to find the most likely troublemakers.

For a visual demo of how I use LogReview in context with my other tools, watch my BSides CT 2025 presentation How to fight DDoS attacks from the command line for a more complete picture of the process I use to thwart attacks.

It is still a work-in-progress (WIP), but it is functional enough that it helps at this point. More complete projects include goaccess and apachetop, but this is good in some ways too.

LogReview pairs well with my firewallblockgen scripts to identify larger patterns and block addresses in bulk. For automated handling, I recommend using the configuration I submitted as a PR for reaction with ipset instead of fail2ban.

LogReview lives in the https://github.com/TechnologyClassroom/LogReview/ repository.

Limits

LogReview helps to quickly find the most obvious problems with a web server. LogReview may not help when...

...botnets and vulnerability scanners intentionally try to "fly under the radar" of log analysis by rotating quickly through IPs. The top user-agent results in logreview.sh may still identify patterns here.
...user-agent are intentionally randomized. A more thorough review of all user-agents may help discover further patterns of abnormalities.
...large files are used to hog bandwidth. apachetop or GoAccess help here.
...slowloris attacks are used.

How to use logreview.sh

Place LogReview on a server the first time you are configuring LogReview.

git clone https://github.com/TechnologyClassroom/LogReview logreview

Change to the directory.

cd logreview

Copy the template configuration file the first time you are configuring LogReview.

cp -n logreview.conf.defaults logreview.conf

Change the config file to match your log the first time you are configuring LogReview. The config needs to point to the web server log file and specify which column the IPs are located.

editor logreview.conf

Run the script.

bash logreview.sh

The output should help get a glimpse of what is happening. Potentially block individual addresses with unwanted behavior or use tools to modify unwanted behavior.

Make sure you do not take action against automation that you are running on your own site like the site monitoring processes that you may be running yourself such as Uptime Kuma, Prometheus, or Munin. Place these addresses and or user-agents under the grepexclusion section of logreview.sh to exclude known good results.

I would recommend taking notes on the research that you find. You will likely see patterns over time and you may forget the decisions that you previously made. Sometimes your decisions may change over time based on new information that is contrary to how a service presents itself.

After taking action or recognizing known behavior, you can add addresses to the TMPBLOCK line to continue digging deeper.

Patterns to look for

The top 10 most frequent IP addresses hitting the server likely show what is slowing the server down.
If an IP address or user-agent is hitting your site at several orders of magnitude greater than anything else. Look at it first. If it claims to be a normal web browser user-agent, it is likely lying.
Bots that identify themselves
- Bots run by search engines are typically wanted by most sites. Blocking them will affect your page rank or Seach Engine Optimization (SEO). I would recommend not doing this for known good search engines such as Googlebot, Applebot, bingbot, DuckDuckBot, SeznamBot, YandexBot, MojeekBot, Amazonbot, yacybot, and Yahoo! Slurp unless they are specifically broken.
  - There are exception though. Sometimes bots will identify as a search engine bot while not really having a functional search engine. This is one tactic used by companies building datasets for training generative AI models.
  - Sometimes I have seen bad bots impersonating one of the major search engines by using their exact same user-agents so I would recommend avoiding the exclusion of their user-agents from results. Many of the major search engines provide documentation of verifying the authenticity of their bots.
- Common Crawl (CCBot) is an attempt at crawling the entire web collectively so that every single company crawling the web could stop and just download the latest Common Crawl archive. Having one bot hit your site would be better than having dozens hit your site right? Reality is a bit different, but the concept is nice. I allow it.
If a specific version of a web browser is the top user-agent by a significant margin and it is about 10 or more versions behind what you would expect, it is likely some time of automation. Acquire the list of IPs used by that user-agent on your server and process them with my ip-to-asn-info.sh script from FirewallBlockGen to find out more about where the requests are coming from. You will likely find that it is a bot that does not identify itself.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logreview.conf.defaults		logreview.conf.defaults
logreview.sh		logreview.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LogReview

Limits

How to use logreview.sh

Patterns to look for

About

Uh oh!

Releases

Packages

Languages

License

TechnologyClassroom/LogReview

Folders and files

Latest commit

History

Repository files navigation

LogReview

Limits

How to use logreview.sh

Patterns to look for

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages