While the default format for Apache logs has some benefits (the CLF is understood by many log analyzers), it’s very awkward zu read (only one space separating entries) and difficult to parse on the command line (spaces everywhere, and no hostname in the actual lines, which makes parsing multiple files a slight hassle).
As I had to change the log format anyway to include the response time (for identifying slow requests), I changed it to the following:
#ServerName date time port r_ip status rtime request referer user_agent rsize LogFormat "[%v] %{%F %T}t %p %a %>s %D %r %{Referer}i %{User-agent}i %B" combined
The big difference? The actual fields are tab-delimited. Thus they can be easily parsed by cut -f
, which avoids all the awkward awk/grep/sed hassle of the CLF; or can be imported into spreadsheet softwares (as tab-delimited CSV) for visualization. This saves a lot of time if someone needs some ad-hoc statistics again and I don’t want to run webalizer or something similar over the aggregated logfiles. Also, the bigger space between each entry makes it much easier to read manually.
(Since LogFormat directives can override each other, deployment can be reduced to “throw two lines into a file in apache2/conf.d, throw file into one of our deb packages, update package on servers”).