0%

Apache Maven 编译打包 Flink1.10

Apache Maven 编译打包Flink

微信截图_20200530031106.png

给爷狠狠得Success


第二次更新于 2019/6/11

Apache Maven 编译打包Flink

因为需要修改一些Flink的模块,所以需要自己编译打包Flink,花了好长时间摸索,成功打包1.8.1和1.10.1之后,记录下自己踩的坑。

准备工作

在打包Flink之前就有一些准备工作需要做。

环境和预处理

类型 版本
系统版本 win10
maven 3.6.3
JDK 8u231
scala 2.11.8
hadoop 2.7.6
node v12.14.0
Flink版本 1.8.1&1.10.1

路径注意全英文

一些区别

Flink1.8.1编译的时候,编译遇到几次错误是因为

1
2
<scope>test</scope>
<type>test-jar</type>

Flink1.10.1并没有遇到,主要出现在两个模块,flink-s3-fs-hadoopflink-oss-fs-hadoop

Flink1.8.1还在flink-connectors/flink-hadoop-compatibility/pom.xml模块里面添加了

1
2
3
4
5
<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>1.4</version>
</dependency>

以及在flink-connectors/pom.xml里面添加了

1
2
3
4
5
<dependency>
<groupId>commons-net</groupId>
<artifactId>commons-net</artifactId>
<version>3.6</version>
</dependency>

这些问题在github下载下来的release版本1.10.1里面并没有出现,不知道是改进了还是什么原因。

网络部分

为了在编译的过程中,排错彻底排除网络原因,我采用了几个办法:

连接手机热点:非常有用的措施,电信宽带被DNS污染非常严重,移动对于外网应该是最宽容的

SSR:快速 稳定的线路一条

配置Maven的代理到SSR的端口上

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
<?xml version="1.0" encoding="UTF-8"?>

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

<!--
| This is the configuration file for Maven. It can be specified at two levels:
|
| 1. User Level. This settings.xml file provides configuration for a single user,
| and is normally provided in ${user.home}/.m2/settings.xml.
|
| NOTE: This location can be overridden with the CLI option:
|
| -s /path/to/user/settings.xml
|
| 2. Global Level. This settings.xml file provides configuration for all Maven
| users on a machine (assuming they're all using the same Maven
| installation). It's normally provided in
| ${maven.conf}/settings.xml.
|
| NOTE: This location can be overridden with the CLI option:
|
| -gs /path/to/global/settings.xml
|
| The sections in this sample file are intended to give you a running start at
| getting the most out of your Maven installation. Where appropriate, the default
| values (values used when the setting is not specified) are provided.
|
|-->
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<!-- localRepository
| The path to the local repository maven will use to store artifacts.
|
| Default: ${user.home}/.m2/repository

-->

<!-- interactiveMode
| This will determine whether maven prompts you when it needs input. If set to false,
| maven will use a sensible default value, perhaps based on some other setting, for
| the parameter in question.
|
| Default: true
<interactiveMode>true</interactiveMode>
-->

<!-- offline
| Determines whether maven should attempt to connect to the network when executing a build.
| This will have an effect on artifact downloads, artifact deployment, and others.
|
| Default: false
<offline>false</offline>
-->

<!-- pluginGroups
| This is a list of additional group identifiers that will be searched when resolving plugins by their prefix, i.e.
| when invoking a command line like "mvn prefix:goal". Maven will automatically add the group identifiers
| "org.apache.maven.plugins" and "org.codehaus.mojo" if these are not already contained in the list.
|-->
<pluginGroups>
<!-- pluginGroup
| Specifies a further group identifier to use for plugin lookup.
<pluginGroup>com.your.plugins</pluginGroup>
-->
</pluginGroups>

<!-- proxies
| This is a list of proxies which can be used on this machine to connect to the network.
| Unless otherwise specified (by system property or command-line switch), the first proxy
| specification in this list marked as active will be used.
|-->
<proxies>
<!--
proxy
| Specification for one proxy, to be used in connecting to the network.
|
<proxy>

<id>httpproxy</id>
<active>true</active>
<protocol>http</protocol>

<username>proxyuser</username>
<password>proxypass</password>
<nonProxyHosts>local.net|some.host.com</nonProxyHosts>

<host>socks5://127.0.0.1</host>
<port>1080</port>

</proxy>

<proxy>
<id>httpsproxy</id>
<active>true</active>
<protocol>https</protocol>

<username>proxyuser</username>
<password>proxypass</password>
<nonProxyHosts>local.net|some.host.com</nonProxyHosts>

<host>socks5://127.0.0.1</host>
<port>1080</port>
</proxy> -->

<proxy>
<id>ss</id>
<active>true</active>
<protocol>socks5</protocol>
<username></username>
<password></password>
<host>127.0.0.1</host>
<port>1080</port>
<nonProxyHosts>127.0.0.1</nonProxyHosts>
</proxy>
</proxies>

<!-- servers
| This is a list of authentication profiles, keyed by the server-id used within the system.
| Authentication profiles can be used whenever maven must make a connection to a remote server.
|-->
<servers>
<!-- server
| Specifies the authentication information to use when connecting to a particular server, identified by
| a unique name within the system (referred to by the 'id' attribute below).
|
| NOTE: You should either specify username/password OR privateKey/passphrase, since these pairings are
| used together.
|
<server>
<id>deploymentRepo</id>
<username>repouser</username>
<password>repopwd</password>
</server>
-->

<!-- Another sample, using keys to authenticate.
<server>
<id>siteServer</id>
<privateKey>/path/to/private/key</privateKey>
<passphrase>optional; leave empty if not used.</passphrase>
</server>
-->
</servers>

<!-- mirrors
| This is a list of mirrors to be used in downloading artifacts from remote repositories.
|
| It works like this: a POM may declare a repository to use in resolving certain artifacts.
| However, this repository may have problems with heavy traffic at times, so people have mirrored
| it to several places.
|
| That repository definition will have a unique id, so we can create a mirror reference for that
| repository, to be used as an alternate download site. The mirror site will be the preferred
| server for that repository.
|-->
<mirrors>
<!-- mirror
| Specifies a repository mirror site to use instead of a given repository. The repository that
| this mirror serves has an ID that matches the mirrorOf element of this mirror. IDs are used
| for inheritance and direct lookup purposes, and must be unique across the set of mirrors.
|
<mirror>
<id>mirrorId</id>
<mirrorOf>repositoryId</mirrorOf>
<name>Human Readable Name for this Mirror.</name>
<url>http://my.repository.com/repo/path</url>
</mirror>
-->




<mirror>


<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>alimaven</id>
<mirrorOf>central</mirrorOf>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/repositories/central/</url>
</mirror>

<mirror>
<id>ibiblio</id>
<mirrorOf>central</mirrorOf>
<name>Human Readable Name for this Mirror.</name>
<url>http://mirrors.ibiblio.org/pub/mirrors/maven2/</url>
</mirror>
<mirror>
<id>jboss-public-repository-group</id>
<mirrorOf>central</mirrorOf>
<name>JBoss Public Repository Group</name>
<url>http://repository.jboss.org/nexus/content/groups/public</url>
</mirror>

<mirror>
<id>central</id>
<name>Maven Repository Switchboard</name>
<url>http://repo1.maven.org/maven2/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>repo2</id>
<mirrorOf>central</mirrorOf>
<name>Human Readable Name for this Mirror.</name>
<url>http://repo2.maven.org/maven2/</url>
</mirror>

</mirrors>

<!-- profiles
| This is a list of profiles which can be activated in a variety of ways, and which can modify
| the build process. Profiles provided in the settings.xml are intended to provide local machine-
| specific paths and repository locations which allow the build to work in the local environment.
|
| For example, if you have an integration testing plugin - like cactus - that needs to know where
| your Tomcat instance is installed, you can provide a variable here such that the variable is
| dereferenced during the build process to configure the cactus plugin.
|
| As noted above, profiles can be activated in a variety of ways. One way - the activeProfiles
| section of this document (settings.xml) - will be discussed later. Another way essentially
| relies on the detection of a system property, either matching a particular value for the property,
| or merely testing its existence. Profiles can also be activated by JDK version prefix, where a
| value of '1.4' might activate a profile when the build is executed on a JDK version of '1.4.2_07'.
| Finally, the list of active profiles can be specified directly from the command line.
|
| NOTE: For profiles defined in the settings.xml, you are restricted to specifying only artifact
| repositories, plugin repositories, and free-form properties to be used as configuration
| variables for plugins in the POM.
|
|-->
<profiles>
<!-- profile
| Specifies a set of introductions to the build process, to be activated using one or more of the
| mechanisms described above. For inheritance purposes, and to activate profiles via <activatedProfiles/>
| or the command line, profiles have to have an ID that is unique.
|
| An encouraged best practice for profile identification is to use a consistent naming convention
| for profiles, such as 'env-dev', 'env-test', 'env-production', 'user-jdcasey', 'user-brett', etc.
| This will make it more intuitive to understand what the set of introduced profiles is attempting
| to accomplish, particularly when you only have a list of profile id's for debug.
|
| This profile example uses the JDK version to trigger activation, and provides a JDK-specific repo.
<profile>
<id>jdk-1.4</id>

<activation>
<jdk>1.4</jdk>
</activation>

<repositories>
<repository>
<id>jdk14</id>
<name>Repository for JDK 1.4 builds</name>
<url>http://www.myhost.com/maven/jdk14</url>
<layout>default</layout>
<snapshotPolicy>always</snapshotPolicy>
</repository>
</repositories>
</profile>
-->

<!--
| Here is another profile, activated by the system property 'target-env' with a value of 'dev',
| which provides a specific path to the Tomcat instance. To use this, your plugin configuration
| might hypothetically look like:
|
| ...
| <plugin>
| <groupId>org.myco.myplugins</groupId>
| <artifactId>myplugin</artifactId>
|
| <configuration>
| <tomcatLocation>${tomcatPath}</tomcatLocation>
| </configuration>
| </plugin>
| ...
|
| NOTE: If you just wanted to inject this configuration whenever someone set 'target-env' to
| anything, you could just leave off the <value/> inside the activation-property.
|
<profile>
<id>env-dev</id>

<activation>
<property>
<name>target-env</name>
<value>dev</value>
</property>
</activation>

<properties>
<tomcatPath>/path/to/tomcat/instance</tomcatPath>
</properties>
</profile>
-->
</profiles>

<!-- activeProfiles
| List of profiles that are active for all builds.
|
<activeProfiles>
<activeProfile>alwaysActiveProfile</activeProfile>
<activeProfile>anotherAlwaysActiveProfile</activeProfile>
</activeProfiles>
-->
</settings>

稍微测试一下,连上了代理然后连接阿里云,如果有插件下载不下来再把阿里云的镜像地址注释掉。如此确保网络没有问题。

框架的可视化部分

对于可视化部分,需要用到node js,在我实际编译的过程中,如果不提前做好node.js的准备工作,很容易就会卡死在那,原因未知。

在Flink的安装过程中,会执行一次

1
npm ci --cache-max=0 --no-save

应该提前在flink-release-1.10.1\flink-runtime-web\web-dashboard文件目录中执行一次,如果能够轻松执行成功的话说明ok

我遇到的问题:

首先我这条命令是执行不了的,执行到某一行命令,自动去github上面抓取某个.node文件,结果一直下载不下来,因为上面已经排除了网络原因,我自己打开那个链接找了一下发现那个网页已经换过了位置,自己手动下载下载之后,我使用

1
node XX.node

命令手动装载,这条命令有没有执行效果我并不确定,因为后面我的npm大量报错,我进行了非常多的操作,不确定有没有重置这个操作。

出现的错误:

微信图片_20200530033900.png

这个错误在我更新了个如下代码后得到缓解,之所以说缓解,稍后会解释。

首先查询到这个插件的最新版本,然后在flink-release-1.10.1\flink-runtime-web\web-dashboard目录下执行

1
2
npm uninstall @angular-devkit/build-angular
npm install @angular-devkit/build-angular@0.901.7

同时可能用到的npm清缓存重新安装命令在这里附上

1
2
3
4
rm -rf node_modules
rm package-lock.json
npm cache clear --force
npm install

开始编译

首先遇到的 flink-shaded-hadoop-2 模块在中央仓库找不到,后来发现官网已经进行了说明

进行编译之前根据需求在官方文档上面找到自己需要的内容:Building Flink From Source

1
If the used Hadoop version is not listed on the download page (possibly due to being a Vendor-specific version), then it is necessary to build flink-shaded against this version. You can find the source code for this project in the Additional Components section of the download page.

这里有两种解决方式:

  1. 自己去maven仓库下载一个版本相近的jar包回来,然后用安装命令安装到本地仓库,修改一下版本号即可,大多数情况下都能使用。这种方法很通用。
1
mvn install:install-file -DgroupId=org.apache.flink -DartifactId=flink-shaded-hadoop-2 -Dversion=2.7.6-9.0 -Dpackaging=jar  -Dfile=./flink-shaded-hadoop-2-2.7.5-7.0.jar
  1. 下载 flink-shaded包先进行编译打包,需要注意的是,会存在CDH版本等等不同的hadoop版本。

因为涉及到不同的CDH版本的包,所以这里添加下面仓库,防止找不到需要的包。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
<profile>
<id>vendor-repos</id>
<activation>
<property>
<name>vendor-repos</name>
</property>
</activation>
<!-- Add vendor maven repositories -->
<repositories>
<!-- Cloudera -->
<repository>
<id>cloudera-releases</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<!-- Hortonworks -->
<repository>
<id>HDPReleases</id>
<name>HDP Releases</name>
<url>https://repo.hortonworks.com/content/repositories/releases/</url>
<snapshots><enabled>false</enabled></snapshots>
<releases><enabled>true</enabled></releases>
</repository>
<repository>
<id>HortonworksJettyHadoop</id>
<name>HDP Jetty</name>
<url>https://repo.hortonworks.com/content/repositories/jetty-hadoop</url>
<snapshots><enabled>false</enabled></snapshots>
<releases><enabled>true</enabled></releases>
</repository>
<!-- MapR -->
<repository>
<id>mapr-releases</id>
<url>https://repository.mapr.com/maven/</url>
<snapshots><enabled>false</enabled></snapshots>
<releases><enabled>true</enabled></releases>
</repository>
</repositories>
</profile>

CDH示例:

1
mvn  clean install -DskipTests -Drat.skip=true -Pvendor-repos  -Dhadoop.version=2.6.0-cdh5.16.1

新版本的Flink 这个模块都需要自己编译和hadoop适配的版本

1
git clone https://github.com/apache/flink-shaded.git

或者在release里面下载某个特定版本,这个特定版本是什么版本呢,在报错里面的后缀可以找到版本,这个版本号和Flinkd版本并不相同,需要注意。

我是使用git下载的,所以首先要配置好git的代理

1
2
git config --global http.proxy 'socks5://127.0.0.1:1080' 
git config --global https.proxy 'socks5://127.0.0.1:1080'

下载完成后,进入文件夹

查看远程分支

1
2
3
4
5
6
7
8
9
10
11
12
13
14
git branch -a
* master
remotes/origin/HEAD -> origin/master
remotes/origin/master
remotes/origin/release-1.0
remotes/origin/release-10.0
remotes/origin/release-11.0
remotes/origin/release-3.0
remotes/origin/release-4.0
remotes/origin/release-5.0
remotes/origin/release-6.0
remotes/origin/release-7.0
remotes/origin/release-8.0
remotes/origin/release-9.0

查看本地分支

1
2
git branch
* master

查看分支详细信息

1
git branch -va

拉下缺少对应版本的shade,并且建立名为v0.9的分支,并且以这个分支为基础编辑

1
git checkout -b v9.0 origin/release-9.0

然后选择合适的hadooop版本,用上面的命令编译即可。

后面出了个flink-shaded-hadoop-2-uber出了个差不多的问题,但是因为没有找到这个项目,所以直接下载了一个,然后用上面提到的命令安装到了maven仓库里面。

Node权限(参考 版本并非最新)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cd ~
wget https://npm.taobao.org/mirrors/node/v10.14.1/node-v10.14.1-linux-x64.tar.gz
tar zxvf node-v10.14.1-linux-x64.tar.gz
mv node-v10.14.1-linux-x64 node
ln -s ~/node/bin/node /usr/local/bin/node
ln -s ~/node/bin/npm /usr/local/bin/npm
#
npm install -g cnpm --registry=https://registry.npm.taobao.org
#
alias cnpm="npm --registry=https://registry.npm.taobao.org \
--cache=$HOME/.npm/.cache/cnpm \
--disturl=https://npm.taobao.org/dist \
--userconfig=$HOME/.cnpmrc"
# 处理 npm 权限
npm config -g set unsafe-perm

测试

1
2
3
4
5
# 输入 
npm
# 显示如下 则环境正常
Usage: npm <command>
where <command> is one of:

Maven命令

1
mvn clean install -Dfast -DskipTests -Pvendor-repos -Drat.skip=true -Pinclude-hadoop -Dhadoop.version=2.7.6 -Dmaven.compile.fork=true -Dscala-2.11 -T 2C

windows powershell下:

1
mvn clean install -DskipTests -Dfast -Pvendor-repos '-Drat.skip=true' -Pinclude-hadoop '-Dhadoop.version=2.7.6' '-Dmaven.compile.fork=true' '-Dscala-2.11 -T 8C'
1
2
3
4
5
6
# -Dscala-2.11     # 指定scala的版本为2.11
# -Pvendor-repos # 使用cdh、hdp 的hadoop 需要添加该参数
# -Dfast #在flink根目录下pom.xml文件中fast配置项目中含快速设置,其中包含了多项构建时的跳过参数. #例如apache的文件头(rat)合法校验,代码风格检查,javadoc生成的跳过等,详细可阅读pom.xml
# install maven的安装命令
# -T2C #支持多处理器或者处理器核数参数,加快构建速度,推荐Maven3.3及以上
# -Dhadoop.version=2.6.0-cdh5.7.0 指定 hadoop 的版本

重要补丁1

Maven clean install 成功了 未必打包出来的东西就能运行,打包出来的东西必须经过验证才知道能不能运行。

在使用了上面的mvn置顶hadoop版本的命令之后,勉强能打包,但是跑出来的东西并不能运行,最后使用的还是最简单的命令打包运行成功了。

1
mvn clean install -DskipTests -Dfast

重要补丁2

在打包的过程中遇到过几次很头疼的问题,其中之一应该还是test-jar包在install安装的时候会卡住,然后就一直卡住,后来发现是pom里面的profile没有添加一些属性,添加完成之后就ok了。

上面的问题还算是比较简单,下面这个问题我再处理一次感觉也只能随机应变,不一定说就一定能够处理好,主要讲一下思路。

这个问题就是框架的可视化模版,Flink-runtime-web模块以及里面的仪表盘模块。

这个模块首先依赖于node.js和npm,因为是在mac上面编译的,但是mac对node.js的管理有点小复杂,又是用brew安装n 又是这 又是那的,
推荐只要使用最简单的安装解压包 然后在zsh或者bash里面配置一下属性,然后source生效即可。

首先确保了全局唯一node之后,然后挺坑的一点就是执行过程中,Flink会在仪表盘目录下重新安装一个新的node

。。。

这样一来就有点扯了,本地配置的module无法在新环境里面奏效

但是有一点要说明一下 这种情况并非是绝对的,并非会在这个环节出问题,但是如果在这个环节出问题的话,要耐心根据日志找到新的module,然后在module里面操作execution标签页,
新的execution 修改完之后可以然后在使用 mvn 中的 -rf : 模块名 命令来执行直接跳到这个模块编译的命令。

我当时的大概处理方法是,找到flink-runtime-web模块里的execution然后把他要执行的命令配合日志记录下来,在仪表盘目录下先执行一遍

1
2
3
cd flink-runtime-web/web-dashboard
npm ci --cache-max=0 --no-save
npm update

执行完成之后,不急着运行mvn命令,可以吧module里面的文件夹和系统的module对比,合并出来,然后再放到仪表盘文件夹,修改pom里面的execution,去掉初始化的命令即可。(因为已经手动运行过了)

还有需要安装一个ng

1
npm i -g @angular/cli@1.3.0

确认ng可用即可,如果版本过高的话需要退下来
版本过高报错:
too many symbolic links encountered, stat …..

1
2
3
4
5
6
7
npm uninstall -g @angular/cli 

npm remove -g @angular/cli

npm cache clean

npm i -g @angular/cli@1.3.0
1
2
3
4
5
6
7
8
# 建立软链接
ln -s ~/node/bin/node /usr/local/bin/node
ln -s ~/node/bin/npm /usr/local/bin/npm

npm install -g cnpm --registry=https://registry.npm.taobao.org

# 处理npm权限
npm config -g set unsafe-perm

其他

比如说如果没有apache 2.0协议 没有办法把代码加进去编译/有的test没必要一起编译 有两种办法 一个是pom里面有的东西去掉 还有里面是把协议加上去的。