Other Observability Tools
Contents of this page:
- Kafster, burrow, Clarebot, honeycomb
- Log sampling
- Kafka
- FileBeat / Lumberjack
- Logrotate
- StatsD
- Prometheus
Kafster, burrow, Clarebot, honeycomb
Log sampling
-
I think this is when there are so many logs being generated that it is impractial to store / analyse all of them. So instead, just a random proportion are analysed
-
Like this (random sampling): https://www.rsyslog.com/doc/v8-stable/tutorials/random_sampling.html
-
Like this (hash-based sampling): https://www.rsyslog.com/doc/v8-stable/tutorials/hash_sampling.html
-
Kafka
- Message queue / bus
- Pub-sub model
- Various data sources publish to it, consumers reading that data
- This doc (private to Clare) documents how Kafka was part of the logging solution at Eritrea.
FileBeat / Lumberjack
-
Run on the hosts
-
Reading log files from disk, then passes data to logstash or kafka
-
Original software mostly writing their logs to disk
-
Config:
- If you want to see which log files are getting their data written to an ELK cluster and where it’s being sent, check the filebeat config (in /etc/filebeat.d) or lumberjack config (eg /etc/lumberjack.conf) on the host
-
You can then for instance search for “thingelk” to find entries with dest:”thingelk” and see which log files are being sent to thingelk
-
Or just generally scroll down to see the “files” section, where each element has
-
Fields - how the data will be constructed in the destination?,
-
Paths - which log files are being sent
-
Dest - where data will end up
-
Logrotate
-
logrotate used for Linux utilities, Java stuff like ES uses log4j
-
Config for httpd on thingelk hosts lives in /etc/logrotate_httpd.conf
-
In config
-
if it says daily at the top, the number lower down refers to that (so it might be 4 days, not 4 weeks)
-
delaycompress - create a daily log which contains logs from previous 24 hours, but then wait another 24 hours before compressing
-
The values at the top are the defaults, then they are overridden for individual file types
-
So for instance it might say rotate 4 at the top, then rotate 3 in the type you care about
- This is a bit unusual though - often the global / default stuff is in a separate file.
-
StatsD
See https://docs.google.com/document/d/1ydXrsZX_g9uyvgMoZUovT0G8L5OcrXxRQDSWzw3ULDo/edit
Prometheus
-
More info here: https://docs.google.com/document/d/1ydXrsZX_g9uyvgMoZUovT0G8L5OcrXxRQDSWzw3ULDo/edit
-
Iike Graphite, Contains a time series data store
-
Julius Volz – original author
-
A pull model – actively scrapes data to store it
-
As well as providing a data store, also has an alert monitor
-
Can be queried to investigate the data
-
Prometheus can contain more info than Graphite
-
You can add arbitrary labels, but limited by cardinality of the values in the labels
-
More flexible, nor hierarchical, easier to slice things
-
-
Sliding window ephemeral store