参考
https://www.cnblogs.com/davidwang456/p/4485433.html?_t=1443088424295
https://segmentfault.com/a/1190000009550668
https://blog.csdn.net/huixueyi/article/details/81117379
https://www.cnblogs.com/FlyAway2013/p/10944836.html
redhat6.5 通过yum安装如下组件
java-1.8.0-openjdk-1.8.0.242.b07-1.el6_10.x86_64
mongodb-server-2.4.14-4.el6.x86_64(元数据)
graylog-server-2.3.2-1.noarch (日志展示与搜索)
elasticsearch-2.4.6-1.noarch (日志数据)
rsyslog-5.8.10-12.el6.x86_64 (采集)
问题:
1、由于配置yum通过代理proxy=http://192.168.1.250:3128访问互联网,后因主机变更了IP导致Squid服务配置未允许其代理访问,排查了半天
2、先安装了elasticsearch5.x启动正常,但是graylog始终提示“graylog Could not load field information”,且elasticsearch.yml配置改network.host后无法启动,后安装elasticsearch2.x正常
3、graylog的inputs里syslog tcp无法接收数据,gelf udp能接收WAF日志而无法显示和查询,最后rsyslog.conf配置*.* @@192.168.0.245:5142终于能显示和查询收集的日志数据
运行中的问题:
Journal utilization is too high
Uncommited messages deleted from journal (triggered 17 hours ago)
Some messages were deleted from the Graylog journal before they could be written to Elasticsearch. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit. (此时搜索页面最近几小时没有数据)
graylog的日志里有报错”2020-04-14T16:36:19.907+08:00 WARN [KafkaJournal] Journal utilization (96.0%) has gone over 95%“
检查/var/lib/graylog-server/journal目录大小为2.3G,elasticsearch目录为187G,查询到可调整参数message_journal_max_age = 12h,message_journal_max_size = 5gb
[root@logserver2 data]# curl http://192.40.0.245:9200/_cat/health?v
[root@logserver2 data]# curl http://192.40.0.245:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open graylog_1 4 0 20003484 0 5.3gb 5.3gb
green open graylog_0 4 0 20000663 0 5.2gb 5.2gb
green open graylog_2 4 0 625396078 0 175.9gb 175.9gb
[root@logserver2 data]# curl http://192.40.0.245:9200/_cat/shards?v
index shard prirep state docs store ip node
graylog_2 2 p STARTED 157002506 44.1gb 192.40.0.245 Mister One
graylog_2 3 p STARTED 157005388 44.1gb 192.40.0.245 Mister One
graylog_2 1 p STARTED 156985606 44.1gb 192.40.0.245 Mister One
graylog_2 0 p STARTED 157009165 44.3gb 192.40.0.245 Mister One
最后重启了graylog-server和elasticsearch并配置了index保留策略
参考以下链接在同个主机上安装了loganalyzer+apache+php+mysql日志服务器
https://www.cnblogs.com/mchina/p/linux-centos-rsyslog-loganalyzer-mysql-log-server.html
Uncommited messages deleted from journal (triggered 19 hours ago)
Journal utilization is too high (triggered 19 hours ago)
日志文件/var/log/graylog-server/server.log发现如下信息
2020-05-18T13:10:50.479+08:00 WARN [KafkaJournal] Journal utilization (96.0%) has gone over 95%
目录/var/lib/graylog-server/journal大小为2.8GB,elasticsearch目录为367GB
[root@logserver2 elasticsearch]# curl http://192.40.0.245:9200/_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1589849275 08:47:55 graylog green 1 1 24 24 0 0 0 0 - 100.0%
[root@logserver2 elasticsearch]# curl http://192.40.0.245:9200/_cat/indices?v
green open graylog_4 4 0 283712567 0 81.9gb 81.9gb
green open graylog_3 4 0 331426010 0 95.8gb 95.8gb
green open graylog_2 4 0 630577716 0 178.1gb 178.1gb
[root@logserver2 elasticsearch]# curl http://192.40.0.245:9200/_cat/shards?v
graylog_4 1 p STARTED 70945974 20.4gb 192.40.0.245 Rom the Spaceknight
graylog_4 2 p STARTED 70956187 20.4gb 192.40.0.245 Rom the Spaceknight
graylog_4 3 p STARTED 70943771 20.4gb 192.40.0.245 Rom the Spaceknight
graylog_4 0 p STARTED 70945706 20.6gb 192.40.0.245 Rom the Spaceknight
graylog_3 1 p STARTED 82855626 23.9gb 192.40.0.245 Rom the Spaceknight
graylog_3 2 p STARTED 82844697 23.9gb 192.40.0.245 Rom the Spaceknight
graylog_3 3 p STARTED 82867925 23.9gb 192.40.0.245 Rom the Spaceknight
graylog_3 0 p STARTED 82857762 23.9gb 192.40.0.245 Rom the Spaceknight
http://192.40.0.245:9000/api/system/jobs内容为{"jobs":[]}
告警第8天了仍未解决,尝试修改graylog的配置参数
elasticsearch_max_docs_per_index = 2000000000
elasticsearch_max_number_of_indices = 100
output_batch_size = 5000
message_journal_max_size = 40gb