快速学习-Azkaban实战

三 Azkaban实战

Azkaba内置的任务类型支持command、java

3.1单一job案例

1）创建job描述文件

[atguigu@hadoop102 jobs]$ vim first.job
#first.job
type=command
command=echo 'this is my first job'

将job资源文件打包成zip文件

[atguigu@hadoop102 jobs]$ zip first.zip first.job 
  adding: first.job (deflated 15%)
[atguigu@hadoop102 jobs]$ ll

总用量 8

-rw-rw-r--. 1 atguigu atguigu  60 10月 18 17:42 first.job
-rw-rw-r--. 1 atguigu atguigu 219 10月 18 17:43 first.zip

注意：

目前，Azkaban上传的工作流文件只支持xxx.zip文件。zip应包含xxx.job运行作业所需的文件和任何文件（文件名后缀必须以.job结尾，否则无法识别）。作业名称在项目中必须是唯一的。

3）通过azkaban的web管理平台创建project并上传job的zip包

首先创建project

上传zip包

4）启动执行该job

点击执行工作流

点击继续

5）Job执行成功

6）点击查看job日志

3.2多job工作流案例

1）创建有依赖关系的多个job描述

第一个job：start.job

[atguigu@hadoop102 jobs]$ vim start.job
#start.job
type=command
command=touch /opt/module/kangkang.txt

第二个job：step1.job依赖start.job

[atguigu@hadoop102 jobs]$ vim step1.job
#step1.job
type=command
dependencies=start
command=echo "this is step1 job"

第三个job：step2.job依赖start.job

[atguigu@hadoop102 jobs]$ vim step2.job
#step2.job
type=command
dependencies=start
command=echo "this is step2 job"

第四个job：finish.job依赖step1.job和step2.job

[atguigu@hadoop102 jobs]$ vim finish.job
#finish.job
type=command
dependencies=step1,step2
command=echo "this is finish job"

2）将所有job资源文件打到一个zip包中

[atguigu@hadoop102 jobs]$ zip jobs.zip start.job step1.job step2.job finish.job
updating: start.job (deflated 16%)
  adding: step1.job (deflated 12%)
  adding: step2.job (deflated 12%)
  adding: finish.job (deflated 14%)

3）在azkaban的web管理界面创建工程并上传zip包

5）启动工作流flow

6）查看结果

思考：

将student.txt文件上传到hdfs，根据所传文件创建外部表，再将表中查询到的结果写入到本地文件

3.3 java操作任务

使用Azkaban调度java程序

1）编写java程序

public class AzkabanTest {
	public void run() throws IOException {
        // 根据需求编写具体代码
		FileOutputStream fos = new FileOutputStream("/opt/module/azkaban/output.txt");
		fos.write("this is a java progress".getBytes());
		fos.close();
    }

	public static void main(String[] args) throws IOException {
		AzkabanTest azkabanTest = new AzkabanTest();
		azkabanTest.run();
	}
}

2）将java程序打成jar包，创建lib目录，将jar放入lib内

[atguigu@hadoop102 azkaban]$ mkdir lib
[atguigu@hadoop102 azkaban]$ cd lib/
[atguigu@hadoop102 lib]$ ll
总用量 4
-rw-rw-r--. 1 atguigu atguigu 3355 10月 18 20:55 azkaban-0.0.1-SNAPSHOT.jar

3）编写job文件

[atguigu@hadoop102 jobs]$ vim azkabanJava.job
#azkabanJava.job
type=javaprocess
java.class=com.atguigu.azkaban.AzkabanTest
classpath=/opt/module/azkaban/lib/*

4）将job文件打成zip包

[atguigu@hadoop102 jobs]$ zip azkabanJava.zip azkabanJava.job 
  adding: azkabanJava.job (deflated 19%)

5）通过azkaban的web管理平台创建project并上传job压缩包，启动执行该job

[atguigu@hadoop102 azkaban]$ pwd
/opt/module/azkaban
[atguigu@hadoop102 azkaban]$ ll
总用量 24
drwxrwxr-x.  2 atguigu atguigu 4096 10月 17 17:14 azkaban-2.5.0
drwxrwxr-x. 10 atguigu atguigu 4096 10月 18 17:17 executor
drwxrwxr-x.  2 atguigu atguigu 4096 10月 18 20:35 jobs
drwxrwxr-x.  2 atguigu atguigu 4096 10月 18 20:54 lib
-rw-rw-r--.  1 atguigu atguigu   23 10月 18 20:55 output
drwxrwxr-x.  9 atguigu atguigu 4096 10月 18 17:17 server
[atguigu@hadoop102 azkaban]$ cat output 
this is a java progress

3.3 HDFS操作任务

[atguigu@hadoop102 jobs]$ vim fs.job
#hdfs job
type=command
command=/opt/module/hadoop-2.7.2/bin/hadoop fs -mkdir /azkaban

2）将job资源文件打包成zip文件

[atguigu@hadoop102 jobs]$ zip fs.zip fs.job 
  adding: fs.job (deflated 12%)

3）通过azkaban的web管理平台创建project并上传job压缩包

4）启动执行该job

5）查看结果

3.4 mapreduce任务

mapreduce任务依然可以使用azkaban进行调度

1）创建job描述文件，及mr程序jar包

[atguigu@hadoop102 jobs]$ vim mapreduce.job
#mapreduce job
type=command
command=/opt/module/hadoop-2.7.2/bin/hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-

2.7.2.jar wordcount /wordcount/input /wordcount/output

[atguigu@hadoop102 jobs]$ zip mapreduce.zip mapreduce.job 
  adding: mapreduce.job (deflated 43%)

4）启动job

3.5 Hive脚本任务

1）创建job描述文件和hive脚本

（1）Hive脚本：student.sql

[atguigu@hadoop102 jobs]$ vim student.sql
use default;
drop table student;
create table student(id int, name string)
row format delimited fields terminated by '\t';
load data local inpath '/opt/module/datas/student.txt' into table student;
insert overwrite local directory '/opt/module/datas/student'
row format delimited fields terminated by '\t'
select * from student;

（2）Job描述文件：hive.job

[atguigu@hadoop102 jobs]$ vim hive.job
#hive job
type=command
command=/opt/module/hive/bin/hive -f /opt/module/azkaban/jobs/student.sql

[atguigu@hadoop102 jobs]$ zip hive.zip hive.job 
  adding: hive.job (deflated 21%)

[atguigu@hadoop102 student]$ cat /opt/module/datas/student/000000_0 
1001    yangyang
1002    bobo
1003    banzhang
1004    pengpeng

快速学习-Azkaban实战

3.1单一job案例

3.2多job工作流案例

3.3 java操作任务

3.3 HDFS操作任务

3.4 mapreduce任务

2.7.2.jar wordcount /wordcount/input /wordcount/output

3.5 Hive脚本任务

继续阅读

nginx location中斜线的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

Ambari介绍和架构原理

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

NOSQL安全攻击

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method