5 实战

使用Flume的核心就在于配置文件

配置Source
配置Channel
配置Sink
组织在一起

5.1 场景1 - 从指定网络端口收集数据输出到控制台

看看官网的第一个

# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

a1:agent名称

r1：Source名称

k1：Sink名称

c1：Channel名称

看看其中的

Sources ： netcat

类似于netcat的源，它侦听给定端口并将每行文本转换为事件。像nc -k -l [host] [port]这样的行为。换句话说，它打开一个指定的端口并侦听数据。期望是提供的数据是换行符分隔的文本。每行文本都转换为Flume事件，并通过连接的通道发送。

必需属性以粗体显示。

Sinks：logger

在INFO级别记录事件。通常用于测试/调试目的。必需属性以粗体显示。此接收器是唯一的例外，它不需要在“记录原始数据”部分中说明的额外配置。

channel：memor

事件存储在具有可配置最大大小的内存中队列中。它非常适用于需要更高吞吐量的流量，并且在代理发生故障时准备丢失分阶段数据。必需属性以粗体显示。

实战

新建example.conf配置

在conf目录下

启动一个agent

使用名为

flume-ng

的shell脚本启动代理程序，该脚本位于Flume发行版的bin目录中。您需要在命令行上指定代理名称，config目录和配置文件：

bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template

回顾命令参数的意义

bin/flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/example.conf \
-Dflume.root.logger=INFO,console

现在，代理将开始运行在给定属性文件中配置的源和接收器。

使用telnet进行测试验证

注意

telnet 127.0.0.1 44444

发送了两条数据

这边接收到了数据

让我们详细分析下上图中的数据信息

2019-06-12 17:52:39,711 (SinkRunner-PollingRunner-DefaultSinkProcessor)
[INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] 
Event: { headers:{} body: 4A 61 76 61 45 64 67 65 0D                      JavaEdge. }

其中的Event是Fluem数据传输的基本单元

Event = 可选的header + byte array

5.2 场景2 - 监控一个文件实时采集新增的数据输出到控制台

Exec Source

Exec源在启动时运行给定的Unix命令，并期望该进程在标准输出上连续生成数据（stderr被简单地丢弃，除非属性logStdErr设置为true）。如果进程因任何原因退出，则源也会退出并且不会生成其他数据。这意味着诸如cat [named pipe]或tail -F [file]之类的配置将产生所需的结果，而日期可能不会 - 前两个命令产生数据流，而后者产生单个事件并退出

Agent 选型

exec source + memory channel + logger sink

配置文件

# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /Volumes/doc/data/data.log
a1.sources.r1.shell = /bin/sh -c

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

在conf下新建配置文件如下：

data.log文件内容

成功接收

5.3 应用场景3 - 将A服务器上的日志实时采集到B服务器

技术选型

exec s + memory c + avro s

avro s + memory c + loger s

exec-memory-avro.conf

# Name the components on this agent
exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel

# Describe/configure the source
exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /Volumes/doc/data/data.log
exec-memory-avro.sources.exec-source.shell = /bin/sh -c

# Describe the sink
exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = localhost
exec-memory-avro.sinks.avro-sink.port = 44444

# Use a channel which buffers events in memory
exec-memory-avro.channels.memory-channel.type = memory
exec-memory-avro.channels.memory-channel.capacity = 1000
exec-memory-avro.channels.memory-channel.transactionCapacity = 100

# Bind the source and sink to the channel
exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel

# Name the components on this agent
exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel

# Describe/configure the source
exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F /Volumes/doc/data/data.log
exec-memory-avro.sources.exec-source.shell = /bin/sh -c

# Describe the sink
exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = localhost
exec-memory-avro.sinks.avro-sink.port = 44444

# Use a channel which buffers events in memory
exec-memory-avro.channels.memory-channel.type = memory
exec-memory-avro.channels.memory-channel.capacity = 1000
exec-memory-avro.channels.memory-channel.transactionCapacity = 100

# Bind the source and sink to the channel
exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel

参考 https://tech.meituan.com/2013/12/09/meituan-flume-log-system-architecture-and-design.html

分布式日志收集框架Flume下载安装与使用（四）5 实战

5 实战

使用Flume的核心就在于配置文件

5.1 场景1 - 从指定网络端口收集数据输出到控制台

Sources ： netcat

Sinks：logger

channel：memor

实战

新建example.conf配置

启动一个agent

使用telnet进行测试验证

5.2 场景2 - 监控一个文件实时采集新增的数据输出到控制台

Exec Source

Agent 选型

配置文件

5.3 应用场景3 - 将A服务器上的日志实时采集到B服务器

技术选型

继续阅读

在当前位置打开命令行窗口的技巧

unit 1 - redhat Enterprise 8.0 Linux 命令行使用技巧

Windows命令行中使用SSH连接Linux

Linux下命令行中的复制和粘贴

1.Linux命令行使用技巧

spec文件详解

BMP文件结构及图像每行字节计算方法

磁盘结构及在Linux中的命名

HK-2000数据采集仪数据库操作说明

终端环境之tmux

查找文件中的字符串

拒绝用户登录:/bin/false和/usr/sbin/nologin

Shell编程——sort排序、uniq忽略重复、tr替换压缩删除、cut指定删除字段、正则表达式元字符sort 命令uniq 命令tr 命令cut 命令正则表达式

Linxu常用命令技巧汇总

《Linux命令行与Shell脚本编程大全第2版.布卢姆》pdf

ACS基本配置-权限等级管理