This post is older than a year. Consider some information might not be accurate anymore.
grok is a filter plugin and “is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format that is generally written for humans and not computer consumption.”
grok = understand (something) intuitively or by empathy. Source: The Oxford English Dictionary
It has a preset of existing patterns. Assumed you have in your Apache HTTP Server configuration:
CustomLog ${APACHE_LOG_DIR}/access.log combined
A log entry example
81.62.38.214 - - [17/Jun/2015:09:27:37 +0200] "GET /wp/ HTTP/1.1" 200 7141 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"
We use for the logstash configuration the grok pattern COMBINEDAPACHELOG
which consists of:
# Log formats
COMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}
To test the matching, we configure in the logstash configuration the output as json and used the parsed value clientip
as input for another filter plugin geoip, to determine the origin country of given IP address.
input {
stdin {}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
geoip {
source => "clientip"
}
}
output {
stdout { codec => json }
}
This will result in:
cinhtau@edge:~/test/logstash/bin$ ./logstash -f apache.conf
Logstash startup completed
81.62.38.214 - - [17/Jun/2015:09:27:37 +0200] "GET /wp/ HTTP/1.1" 200 7141 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"
{"message":"81.62.38.214 - - [17/Jun/2015:09:27:37 +0200] "GET /wp/ HTTP/1.1" 200 7141 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"","@version":"1","@timestamp":"2015-06-22T12:15:42.373Z","host":"edge","clientip":"81.62.38.214","ident":"-","auth":"-","timestamp":"17/Jun/2015:09:27:37 +0200","verb":"GET","request":"/wp/","httpversion":"1.1","response":"200","bytes":"7141","referrer":""-"","agent":""Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"","geoip":{"ip":"81.62.38.214","country_code2":"CH","country_code3":"CHE","country_name":"Switzerland","continent_code":"EU","latitude":47.0,"longitude":8.0,"timezone":"Europe/Zurich","location":[8.0,47.0]}}
If you have to specify custom patterns, that not exists in predefined sets (which is sad but may occur), use the online Grok Debugger to test the matching.