0%

本地使用 VSCode 作为文本编辑器,集成 Hexo 和 PlantUML 在写博客的时候可以插入 UML 图。

安装插件

  1. 安装插件 npm install hexo-tag-plantuml --save, 插件 git 地址
  2. _config.yml 中添加配置
1
2
tag_plantuml:
type: static

UML 测试

提交到 remote repo, 等 hexo deploy 之后可以看到 UML 图

VSCode 中显示 UML

上面的方式虽然远端可以显示了,但是本地查看的时候并不方便。VSCode 有提供插件可以在编辑器中显示对应的 UML

默认官方服务器生成 UML

打开 VSCode, 在插件列表中搜索 PlantUML 并安装即可,使用 @startuml@enduml 包裹 UML 代码

1
2
3
@startuml
Bob->Alice : hello
@enduml

点开 preview mode 可以看到 UML 成功生成

使用 local server 显示 UML

默认情况下插件使用官方服务器生成 UML,优点是方便,缺点是要联网,可能生存有延时。我们可以配置 local server 提供这个服务。已经有现成的 docker 镜像了,配置简单

  1. 启动 container: docker run -d -p 8080:8080 plantuml/plantuml-server:jetty 或者 docker run -d -p 8080:8080 plantuml/plantuml-server:tomcat 效果一样,只是用了不同类型的服务器实现
  2. 我本地有其他服务占用了 8080, 修改端口映射 -p 7999:8080
  3. 访问 http://localhost:7999 查看服务是否启动
  4. 修改 VSCode 中 PlantUML 插件配置
  • cmd+shift+p 搜索 open user setting
  • 关键子搜索 plantuml
  • Plantuml:render 设置下选择 local
  • Plantuml:server 输入本地 server 地址 http://localhost:7999

截图如下

uml plugin setting

点击 preview 之前的 UML 正常显示,设置完成

本来像原滋原味的 setup tomcat4/5 的环境的,但是找了一圈没现成资源,还是拿了别人已经搞过的 Tomcat 8 过来,先搞起来再说,有必要再找 4/5 版本的代码。

  1. 访问官网, 选择 Source Code Distributions 下的 zip 或者 tar.gz,下载。瞄了一眼 README 貌似两者的区别是 zip 是 CRLF 换行而 tar 是 LF 换行。应该对应 windows 和 Linux 系统。
  2. 下载后得到一个 apache-tomcat-8.5.68-src.tar.gz 压缩包,解压它
  3. 在解压后目录中新建 catalina-home 文件夹,并将源码中的 conf, webapps 文件夹拷贝进去。webapps 下的 example 文件夹删掉,不然后面启动会抛异常
  4. 在 catalina-home 下新建四个文件夹:lib, temp, work, logs
  5. 在源文件目录下新建 pom.xml 文件并添加依赖
  6. 删除源文件中的 test 目录避免一些不必要的错误
  7. 打开 Idea, 导入项目,run/debug 处配置启动参数如下
  8. 修改 ContextConfig 文件,在 webConfig(); 后添加 context.addServletContainerInitializer(new JasperInitializer(), null); 不然访问页面会抛 org.apache.jasper.JasperException: java.lang.NullPointerException 的异常
  9. 点击 Idea 上的 run 测试启动,成功
1
2
3
4
5
6
7
<!-- 启动参数 -->
main class: org.apache.catalina.startup.Bootstrap
vm options:
-Dcatalina.home="/Users/i306454/IdeaProjects/apache-tomcat-8.5.68-src/catalina-home"
-Duser.language=en
-Duser.region=US
-Dfile.encoding=UTF-8

PS: Idea 2021.1.3 版本的 vm options 需要点击 Modify Options 自己添加,默认不显示

application

最终,项目的目录结构如下

1
2
3
4
5
6
7
8
9
10
.
└── apache-tomcat-8.5.68-src
├── catalina-home
│ ├── conf
│ ├── lib
│ ├── logs
│ ├── temp
│ ├── webapps
│ └── work
└── pom.xml

PS: Tomcat 5 源码地址

Introduce

Basically there are three things that a servlet container does to service a request for a servlet:

  • 创建 request 对象并塞入后续要用到的信息,比如 parameters,cookies 等。request 是 javax.servlet(.http).ServletRequest 的一个实现
  • 创建一个 response 对象返回给 web client. response 是 javax.servlet(.http).ServletResponse 的一个实现
  • 调用 service 方法,service 中接受 request 的参数,处理结果塞入 response 中

Catalina Block Diagram

Catalina 很复杂,但是他的设计很优雅,采用模块化的思想。主要可以分为两部分 Connector 和 Container,关系如下

connector 主要作用是构建 request/response 并传递给 container 处理,这里只是简化的模型。container 除了处理 request 还有很多东西需要做,比如加载 servlet,更新 session 等。

A Simple Web Server

Chapter1 starts this book by presenting a simple HTTP server.
To build a working HTTP server, you need to know the internal workings of two classes in the java.net package: Socket and ServerSocket.
There is sufficient background information in this chapter about these two classes for you to understand how the accompanying application works.

第一个练习的目标,创建一个简单的 web server. 服务启动后,浏览器输入地址,server 返回请求的静态资源.

逻辑层面上来说模型可以像下面这样展示,但是代码层面上却不行。

按照上面的图示,难道 client 是直接 new 一个 request 和 web server 进行交互吗?难道 web server 会 new 一个 response 发送给 client 吗? 非也。模型画成下面的样子应该更合适

Client 和 Web Server 之间通过 socket 进行交互。在 server 内部,会将 socket 分化为 input 和 output 两个 IO 流,分别对应读取 Client 发送的数据和发送给 Client 相应

项目目录设置如下

1
2
3
4
5
6
7
8
9
10
11
12
13
how-tomcat-works
├── ex01
│ ├── pom.xml
│ └── src
│ └── ex01
│ ├── HttpServer.java
│ ├── Request.java
│ └── Response.java
├── pom.xml
└── webroot
├── images
│ └── logo.gif
└── index.html

HttpServer 表示服务器类,内部实现主要依赖于 ServerSocket 这个 java.net 包下的类,他在绑定本地端口并死循环等待 Client 端的 socket 访问

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
public class HttpServer {
// 指定静态资源目录
public static final String WEB_ROOT = System.getProperty("user.dir") + File.separator + "webroot";
// shutdown command
private static final String SHUTDOWN_COMMAND = "/SHUTDOWN";
// the shutdown command received
private boolean shutdown = false;

public static void main(String[] args) {
HttpServer server = new HttpServer();
server.await();
}

public void await() {
ServerSocket serverSocket = null;
int port = 8080;
// 绑定端口和地址
serverSocket = new ServerSocket(port, 1, InetAddress.getByName("127.0.0.1"));
// 死循环等待 request
while (!shutdown) {
Socket socket = null;
InputStream input = null;
OutputStream output = null;

// 拿到 socket
socket = serverSocket.accept();
// 分化 socket
input = socket.getInputStream();
output = socket.getOutputStream();
// create Request object and parse
Request request = new Request(input);
request.parse();

// create Response object
Response response = new Response(output);
response.setRequest(request);
response.sendStaticResource();

// Close the socket
socket.close();
// check if the previous URI is a shutdown command
shutdown = request.getUri().equals(SHUTDOWN_COMMAND);
}
}
}

Request 类的主要职责是拿到 socket 的输入流,解析并提取 Client 发送过来的信息,这个例子中主要是提取请求的资源路径

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
public class Request {
private InputStream input;
private String uri;

public Request(InputStream input) {
this.input = input;
}

public void parse() {
// Read a set of characters from the socket
StringBuilder request = new StringBuilder(2048);
int i;
byte[] buffer = new byte[2048];
i = input.read(buffer);

for (int j = 0; j < i; j++) {
request.append((char) buffer[j]);
}
// 打印请求信息
System.out.print(request);
uri = parseUri(request.toString());
}

// 解析请求,拿到请求的路径
private String parseUri(String requestString) {
int index1, index2;
index1 = requestString.indexOf(' ');
if (index1 != -1) {
index2 = requestString.indexOf(' ', index1 + 1);
if (index2 > index1)
return requestString.substring(index1 + 1, index2);
}
return null;
}

public String getUri() {
return uri;
}
}

Response 负责向 Client 返回信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
public class Response {
private static final int BUFFER_SIZE = 1024;
Request request;
OutputStream output;

public Response(OutputStream output) {
this.output = output;
}

public void setRequest(Request request) {
this.request = request;
}

public void sendStaticResource() throws IOException {
byte[] bytes = new byte[BUFFER_SIZE];
FileInputStream fis = null;
try {
File file = new File(HttpServer.WEB_ROOT, request.getUri());
if (file.exists()) {
// 书上这段是没有的,当时 browser 比较老,不检测格式,现在浏览器不加这个页面会不能显示,curl 测试倒是不受影响
String header = "HTTP/1.1 200 OK\r\n" +
"Content-Type: text/html\r\n" +
"\r\n";
output.write(header.getBytes());

fis = new FileInputStream(file);
int ch = fis.read(bytes, 0, BUFFER_SIZE);
while (ch != -1) {
output.write(bytes, 0, ch);
ch = fis.read(bytes, 0, BUFFER_SIZE);
}
} else {
// file not found
String errorMessage = "HTTP/1.1 404 File Not Found\r\n"
+ "Content-Type: text/html\r\n"
+ "Content-Length: 23\r\n" + "\r\n"
+ "<h1>File Not Found</h1>";
output.write(errorMessage.getBytes());
}
} catch (Exception e) {
// thrown if cannot instantiate a File object
System.out.println(e.toString());
} finally {
if (fis != null) {
fis.close();
}
}
}
}

结合计算机网络的知识做下分层的整理:Tomcat 处理的问题属于 Application 层,HTTP 规范是在处理 socket 的时候体现的. 那么写 reponse 的时候,需要特殊指定 HTTP 版本,header 等信息,这些参数都是服务器端指定的。

tar 最开始是用来操作磁带设备的,后来用途越来越广。一开始 Linux 系统是不能同时压缩多个文件的,所以,通常需要用 tar 命令将文件打包成一个文件,然后在压缩

tar 的常用操作我归结为三类,分别是打包,解包和查看。打包和解包的基础上还可以加上压缩的操作。

打包

  • -c, 对应的关键字是 create
  • -f, 表明处理的是文件,不加会抛错
  • -v, 可选,打印详细信息
  • -z, 压缩
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
ls
case-id-mapping create_ticket.sh sample_body

tar -cvf my.tar case-id-mapping create_ticket.sh sample_body
a case-id-mapping
a create_ticket.sh
a sample_body

ls -l
-rw-r--r-- 1 i306454 staff 9728 Jun 18 16:41 my.tar

ar -zcvf my.tar case-id-mapping create_ticket.sh sample_body
a case-id-mapping
a create_ticket.sh
a sample_body

ls -l my.tar
-rw-r--r-- 1 i306454 staff 2203 Jun 18 16:44 my.tar

tar -tvf my.tar
-rw-r--r-- 0 i306454 staff 4123 Jun 18 16:35 case-id-mapping
-rwxr--r-- 0 i306454 staff 1347 Jun 18 16:35 create_ticket.sh
-rw-r--r-- 0 i306454 staff 857 Jun 18 16:35 sample_body
  • 不加 -v 中间的命令是不会有 a xxx 的信息的
  • 加上 -z 之后出现压缩效果
  • 压缩的的 tar ball 还是可以通过 -t 查看
  • -f 一定要在最后,不然会报错,这个可以从参数定义看出来,这个参数后面接你要处理的目标文件 cvfz 则表示你要处理 z 这个文件了
  • 如果加了 -z 则最好将你的文件后缀改为 .gz
  • 如果想要查看是否压缩过,使用 file 命令 file my.tar 返回 my.tar: POSIX tar archive

解包

  • -x, 解包操作, 代表 extract 操作
  • -z, 一开始还以为有用,但是测试下来,不需要加这个参数自动就解压缩了
1
2
3
4
tar -xvf my.tar                                              
x case-id-mapping
x create_ticket.sh
x sample_body

查看

之前的压缩已经使用过了,使用 -t 参数

1
2
3
4
tar -tvf my.tar
-rw-r--r-- 0 i306454 staff 4123 Jun 18 16:35 case-id-mapping
-rwxr--r-- 0 i306454 staff 1347 Jun 18 16:35 create_ticket.sh
-rw-r--r-- 0 i306454 staff 857 Jun 18 16:35 sample_body

其他

其他还有一些参数

  • -r, 追加
  • -u, 如果有改动才追加

最简单的 awk 程序是由一系列 pattern-action 组成的

1
2
3
pattern { action }
pattern { action }
...

有时 pattern 会省略,有时 action 会省略。当 awk 检测程序段没有语法错误后,他会一句一句的执行。pattern 没有写即表示匹配每一行。

本章第一节会介绍 pattern, 后面会介绍表达式,赋值等,剩余部分则是介绍函数等信息。

这里的准备文件是有讲究的,直接用 vscode 准备会出问题,最好在终端使用 echo + \t 的方式手动打一遍

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
cat countries                  
USSR 8649 275 Asia
Canada 3852 25 North America
China 3705 1032 Asia
USA 3615 237 North America
Brazil 286 134 South America
India 1267 746 Asia
Mexico 762 78 North America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe

sed -n 'l' countries
USSR\t8649\t275\tAsia$
Canada\t3852\t25\tNorth America$
China\t3705\t1032\tAsia$
USA\t3615\t237\tNorth America$
Brazil\t286\t134\tSouth America$
India\t1267\t746\tAsia$
Mexico\t762\t78\tNorth America$
France\t211\t55\tEurope$
Japan\t144\t120\tAsia$
Germany\t96\t61\tEurope$
England\t94\t56\tEurope$

bat -A countries
1 │ USSR├──┤8649├──┤275├──┤Asia␊
2 │ Canada├──┤3852├──┤25├──┤North·America␊
3 │ China├──┤3705├──┤1032├──┤Asia␊
4 │ USA├──┤3615├──┤237├──┤North·America␊
5 │ Brazil├──┤286├──┤134├──┤South·America␊
6 │ India├──┤1267├──┤746├──┤Asia␊
7 │ Mexico├──┤762├──┤78├──┤North·America␊
8 │ France├──┤211├──┤55├──┤Europe␊
9 │ Japan├──┤144├──┤120├──┤Asia␊
10 │ Germany├──┤96├──┤61├──┤Europe␊
11 │ England├──┤94├──┤56├──┤Europe␊

Patterns

pattern 控制着 action 的执行,下面介绍六种 pattern 类型

  • BEGIN { statements }
  • END { statements }
  • expression { statements }, 当 expression 为 true, statements 会被执行
  • /regular expression/{ statements }
  • compound pattern { statements }, pattern 通过 &&, ||, !, () 链接
  • pattern1, pattern2 { statements }

BEGIN and END

BEGIN 经常用来改变 field 的分隔符,默认的分隔符通过 FS 这个内置变量控制,默认的值有空格, / 和 tab. 下面的例子中,我们将分隔符改为 tab 并统计所有地区的面积和人口总数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
awk '
BEGIN {
FS = "\t"
printf("%10s %6s %5s %s\n\n", "COUNTRY", "AREA", "POP", "CONTINENT")
}
{
printf("%10s %6s %5d %s\n", $1, $2, $3, $4)
area = area + $2
pop = pop + $3
}
END { printf("\n%10s %6d %5d\n", "TOTAL", area, pop) }' countries
COUNTRY AREA POP CONTINENT

USSR 8649 275 Asia
Canada 3852 25 North America
China 3705 1032 Asia
USA 3615 237 North America
Brazil 286 134 South America
India 1267 746 Asia
Mexico 762 78 North America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe

TOTAL 22681 2819

Expressions as Patterns

Expressions 是指运算表达式,由 数字,字符串和符号组成。本书中的 string 指的是 0-n 的字符序列。空字串 “” 在 awk 中被称为 null string. 每个 string 中都包含有 null string.

如果操作符需要一个 string 类型的参数,但是你给了一个 numberic 的参数,则 awk 会将数字类型转为字符类型。同样的,如果操作符需要数字类型,给了字符类型,也会自动做转化。

常见的比较符号

OPERATOR MEANING
< less than
<= less than or equal to
== equal to
!= not equal to
>= greater than or equal to
> greater than
~ matched by
!~ not matched by

举例:

  • NF>10: 选择 field 大于 10 的行
  • NF: 直接数字, match when it’s numberic value is nonzero
  • string, match when value of expression is nonnull
  • num operator num, 做计算
  • num operator str, 都转化为 string 做计算
  • string operator string, 按位比较顺序. “Canada” < “China”
1
2
3
4
awk '$0 >= "M"' countries                                                     
USSR 8649 275 Asia
USA 3615 237 North America
Mexico 762 78 North America

String-matching Patterns

Awk 支持 regular expressions

String-Matching Pattern

  • /regexpr/: 目标是行内容的一部分
  • expression ~ /regexpr/: matches if the string value of expression contains a substring matched by regexpr
  • expression !~ /regexpr/: 和上面的相反
1
2
3
4
5
6
7
8
9
10
11
12
13
14
awk '$4 ~ /Asia/' countries 
USSR 8649 275 Asia
China 3705 1032 Asia
India 1267 746 Asia
Japan 144 120 Asia

awk '$4 !~ /Asia/' countries
Canada 3852 25 North America
USA 3615 237 North America
Brazil 286 134 South America
Mexico 762 78 North America
France 211 55 Europe
Germany 96 61 Europe
England 94 56 Europe

这部分可以这样理解,基本的 awk 格式是 pattern { action }, 而这部分 match 是 pattern 的扩展。pattern = expression (!~) /regexpr/

Regular Expressions

这部分自信已经不用笔记了。。。

只记一个 (Asian|European|North American) 表示单词级别的选择关系

ESCAPE SEQUENCES

SEQUENCE MEANING
\b backspace
\f formfeed
\n newline (line feed)
\r carriage return
\t tab
\ddd octal value ddd, where ddd is 1-3 digits between 0-7
\c any other character c literally

Compound Patterns

混合模式,即多个表示式通过逻辑运算符组合

1
2
3
4
5
6
7
8
9
10
11
12
awk '$4 == "Asia" && $3 > 500' countries
China 3705 1032 Asia
India 1267 746 Asia

awk '$4 == "Asia" || $4 == "Europe"' countries
USSR 8649 275 Asia
China 3705 1032 Asia
India 1267 746 Asia
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe

上面的例子是字符比较,也可以使用正则

1
2
3
4
5
6
7
8
awk '$4 ~ /^(Asia|Europe)$/' countries 
USSR 8649 275 Asia
China 3705 1032 Asia
India 1267 746 Asia
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe

如果其他 field 不包含这两个关键字,还可以用逻辑或筛选

1
2
3
4
5
6
7
8
9
# 等价于 awk '/Asia|Europe/' countries
awk '/Asia/||/Europe/' countries
USSR 8649 275 Asia
China 3705 1032 Asia
India 1267 746 Asia
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe

优先级:! > && > ||

同优先级(|| + &&)的操作 从左到右 的顺序计算

Range Patterns

即两个 pattern 用逗号间隔, pat1, pat2 表示取匹配的 1 行或 n 行内容. 如下面的例子,选取 Canada 出现到 USA 之间的所有行

1
2
3
4
awk '/Canada/, /USA/' countries 
Canada 3852 25 North America
China 3705 1032 Asia
USA 3615 237 North America

如果后一个 pattern 没有匹配的内容,则匹配到末尾

1
2
3
4
5
awk '/Europe/, /Africa/' countries
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
  • FNR: the number of the line just read from current input file
  • FILENAME: the filename itself

打印第一行到第五行

1
2
3
4
5
6
awk 'FNR == 1, FNR == 5 { print FILENAME ":" $0 }' countries 
countries:USSR 8649 275 Asia
countries:Canada 3852 25 North America
countries:China 3705 1032 Asia
countries:USA 3615 237 North America
countries:Brazil 286 134 South America

同样的效果还可以写成

1
2
3
4
5
6
awk 'FNR <= 5 { print FILENAME ": " $0 }' countries         
countries: USSR 8649 275 Asia
countries: Canada 3852 25 North America
countries: China 3705 1032 Asia
countries: USA 3615 237 North America
countries: Brazil 286 134 South America

Summary of Patterns

总结 pattern 支持的格式

PATTERN EXAMPLE MATCHES
BEGIN BEGIN before any input has been read
END END after all input has been read
expression $3 < 100 third field less than 100
string-matching /Asia/ lines that contain Asia
compound $3 < 100 && $4 == “Asia” thrid fields less than 100 + fourth field is Asia
range NR==10, NR==20 tenth to twentieth lines of input inclusive

Actions

在 pattern-action 的格式中,pattern 决定了是否执行 action。action 可以很简单,比如打印;也可以很复杂,比如多语句操作或者包含控制流什么的。下面章节会介绍自定义函数和输入,输出的一些语法。

actions 中可以包含下列语法

  • 包含常量,变量,赋值函数调用的 expressions
  • print
  • printf
  • if 语句
  • if - else
  • while
  • for (expression; expression; expression) statement
  • for (variable in array) statement
  • do statement while (expression)
  • break
  • continue
  • next
  • exit
  • exit expression
  • { statement }

Expressions

expression 是最简单的语句,expression 之间可以通过 operators 连接,有五种 operators

  • arthmetic
  • comparison
  • logical
  • conditional
  • assignment

Constants

两种常数类型:string and numberic

string = 双引号 + 字符 + 双引号,字符包括转义字符

numberic 都是由浮点类型的值表示的,可以有不同的形态,但是内存中都是浮点表示,比如 1e6, 1.00E6 等形式

Variables

  • user-defined
  • built-in
  • fields

由于变量的类型是没有声明的,所以 awk 会根据上下文推断变量类型,必要时它会做 string 和 numberic 之间的转换。

还没有初始化的时候 string 默认是 “” (the null string), numberic 默认是 0

Built-in Variables

下面是一些自带的变量,FILENAME 在每次读文件时都会自动赋值。FNR,NF 和 NR 在每次读入一行时重置。

VARIABLE MEANING DEFAULT
ARGC number of command line arguments -
ARGV array of command line arguments -
FILENAME name of current input file -
FNR record number in current file -
FS controls the input field separator “ “
NF number of fields in current record -
NR number of records read so far -
OFMT output format for numbers “%.6g”
OFS output field separator “ “
ORS output record separator “\n”
RLENGTH length of string matched by match function -
RS controls the input recrod separator “\n”
RSTART start of string matched by match function -
SUBSEP subscript separator “\034”

Field Variables

表示当前行的 field 参数,从 $1 - $NF, $0 表示整行。运行一些例子找找感觉

1
2
3
4
5
6
7
# 第二个 field 值缩小 1000 倍并打印
awk '{ $2 = $2 / 1000; print }' countries

USSR 8.649 275 Asia
Canada 3.852 25 North America
China 3.705 1032 Asia
...

将 North America 和 South America 替换为简写

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
awk 'BEGIN { FS = OFS = "\t" }
$4 == "North America" { $4 = "NA" }
$4 == "South America" { $4 = "SA" }
{print}
' countries
USSR 8649 275 Asia
Canada 3852 25 NA
China 3705 1032 Asia
USA 3615 237 NA
Brazil 286 134 SA
India 1267 746 Asia
Mexico 762 78 NA
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe

PS: 这里之前我倒是没有意识到,上面的做法其实就是多种情况替换的案例了

还有一些比较神奇的使用方式,比如 $(NF - 1) 可以取得倒数第二个 field。如果 field 不存在,默认值为 null string, 比如 $(NF + 1), 一个新的 field 可以通过赋值得到,比如下面的例子是在原有的数据后面添加第五列元素

1
2
3
4
awk 'BEGIN { FS = OFS = "\t" }; { $5 = 1000 * $3 / $2; print }' countries 
USSR 8649 275 Asia 31.7956
Canada 3852 25 North America 6.49013
...

Arthmetic Operators

awk 提供常规计算 +, -, *, %, ^.

Comparison Operators

支持常见的比较操作:<, <=, ==, !=, >=, >。还有匹配符号 ~ 和 !~。比较的结果为 1(true)/0(false) 二选一。

Logical Operators

逻辑运算符有:&&, ||, !

Condition Expressions

expr1 ? expr2 : expr3 效果和 Java 中的一致. 下面的例子会打印 $1 的倒数,如果 $1 为 0 则打印提示信息

1
awk '{ print ($1 !=0 ? 1/$1 : "$1 is zero, line " NR) }' 

Assignment Operators

var = expr, 下面的例子计算所有亚洲国家的人口和

1
2
3
awk '$4 == "Asia" { pop = pop + $3; n = n + 1}
END { print "Total population of the ", n, "Asian countries is", pop, "million"}' countries
Total population of the 4 Asian countries is 2173 million

统计人口最多的国家

1
2
3
awk '$3 > maxpop {maxpop = $3; country = $1}
END { print "country with largest population:", country, maxpop }' countries
country with largest population: China 1032

Increment and Decrement Oerators

n = n + 1 通常简写为 ++n 或者 n++, 区别是,如果有赋值,则 n++ 会将原始值赋给变量再自增,++n 则先自增再赋值

1
2
3
4
awk 'BEGIN { n=1; i=n++ }; END { print i }' countries
1
awk 'BEGIN { n=1; i=++n }; END { print i }' countries
2

Built-In Arithmetic Functions

FUNCTION VALUE RETURNED
atan2(y, x) arctangent of y/x in the range -pi to pi
cos(x) cosine of x, with x in radians
exp(x) exponential function of x, ex
int(x) integer part of x; truncated towards 0 when x > 0
log(x) natural (base e) logarithm of x
rand() random number r, where 0 <= r < 1
sin(x) sine of x, with x in radians
sqrt(x) square root of x
srand(x) x is new seed for rand ()

String Operators

awk 只支持一种字符串操作 - 拼接。拼接不需要任何的连接符, 比如下面的例子是在每行前面打印行号 + 冒号的前缀

1
2
3
4
awk '{ print NR ":" $0 }' countries                  
1:USSR 8649 275 Asia
2:Canada 3852 25 North America
...

Strings as Regular Expressions

awk 'BEGIN { digits = "^[0-9]+$" }; $2 ~ digits' countries, 表达式可以动态拼装,所以下面的例子也是合法的

1
2
3
4
5
6
7
8
BEGIN {
sign = "[+-]?"
decimal= "[0-9]+[.]?[0-9]*"
fraction= "[.][0-9]+"
exponent= "([eEl" sign "[0-9]+)?"
number= "^" sign "(" decimal "|" fraction ")" exponent "$"
}
$0 .. number

Built-In String Functions

Function Description
gsub(r,s) substitute s for r globally in $0, return number of substitutions made
gsub(r ,s ,t) substitutes for r globally in string t, return number of substitutions made
index(s ,t) return first position of string t in s, or 0 if t is not present
length(s) return number of characters in s
match(s ,r) test whether s contains a substring matched by r,return index or 0; sets RSTART and RLENGTH
split(s ,a) split s into array a on FS, return number of fields
split(s ,a ,fs) splits into array a on field separator fs, return number of fields
sprintf(fmt, expr -list ) return expr -list formatted according to format string fmt
sub(r ,s) substitutes for the leftmost longest substring of $0 matched by r, return number of substitutions made
sub(r ,s ,t) substitute s for the leftmost longest substring of t matched by r, return number of substitutions made
substr (s ,p) return suffix of s starting at position p
substr (s ,p ,n) return substring of s of length n starting at position p

p42

Getting Started

通过例子快速入门, 在如下结构化的数据中

  • 计算工作时长 > 0 的总薪资
  • 显示时常 = 0 的用户
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat emp.data
#name, pay rate per hour, work time(hour)
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18

awk '$3 == 0 {print $1}' emp.data
Beth
Dan

awk '$3 > 0 {print $1, $2 * $3}' emp.data
Kathy 40
Mark 100
Mary 121
Susie 76.5

The Structure of an AWK Program

格式 pattern { action }, pattern 如果结果为 true 则执行 action 中定义的行为。然后执行下一行,一直到文本结束。

Running an AWK Program

awk 'program' input files, 多文件 awk 'program' input file1 file2, 不接文件,则从标准输入拿内容

可以将程序部分写到文件中 awk -f progfile optional_list_of_input_files

Simple Output

Print Event Line 没有写 pattern 表示每行都输出,{ print }{ print $0 } 等价,都是打印全部

Print Certain Fields { print $1, $3 }

NF, the Number of Fields 内置变量 NF 表示最后一个 field 的 number, 下面的例子答应 field 数量 + 名字 + 最后一个 field

1
2
3
4
awk '{print NF, $1, $NF}' emp.data
3 Beth 0
3 Dan 0
...

Computing and Printing field 可以直接用于计算 { print $1, $2 * $3 }

Printing Line Numbers NR 代表行号, number of row(record?)

1
2
3
4
awk '{print NR, $0}' emp.data
1 Beth 4.00 0
2 Dan 3.75 0
...

Putting Text in the Output 输出内容中加入自己的文本

1
2
3
4
5
6
7
awk '{print "total pay for", $1, "is", $2 * $3}' emp.data
total pay for Beth is 0
total pay for Dan is 0
total pay for Kathy is 40
total pay for Mark is 100
total pay for Mary is 121
total pay for Susie is 76.5

Fancier Output

print 只是简单的打印内容,如果想要输出更丰富,使用 printf

Lining Up Fields printf 格式 printf(format, value1, value2..., valuen)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
awk '{printf("total pay for %s is %.2f\n", $1, $2 * $3)}' emp.data
total pay for Beth is 0.00
total pay for Dan is 0.00
total pay for Kathy is 40.00
total pay for Mark is 100.00
total pay for Mary is 121.00
total pay for Susie is 76.50

awk '{printf("%-8s $%6.2f\n", $1, $2 * $3)}' emp.data
Beth $ 0.00
Dan $ 0.00
Kathy $ 40.00
Mark $100.00
Mary $121.00
Susie $ 76.50

Sorting the Output 结合 pipe 进行排序

1
2
3
4
5
6
7
awk '{printf("%6.2f %s\n", $2 * $3, $0)}' emp.data | sort
0.00 Beth 4.00 0
0.00 Dan 3.75 0
40.00 Kathy 4.00 10
76.50 Susie 4.25 18
100.00 Mark 5.00 20
121.00 Mary 5.50 22

Selection

Awk 的 pattern 可用于筛选数据

Selection by Comparison 通过比较筛选

1
2
3
4
# 时薪大于5
awk '$2 >= 5' emp.data
Mark 5.00 20
Mary 5.50 22

Selection by Computation 结合计算筛选

1
2
3
4
awk '$2 * $3 >= 50 { printf("$%.2f for %s\n", $2 * $3, $1)}' emp.data
$100.00 for Mark
$121.00 for Mary
$76.50 for Susie

Selection by Text Content 文本匹配选择, == 精确匹配,/match/ 包含

1
2
3
4
5
awk '$1 == "Susie"' emp.data
Susie 4.25 18

awk '/Susie/' emp.data
Susie 4.25 18

Combinations of Patterns 使用 ||, && 做逻辑操作

1
2
3
4
5
6
awk '$2 >= 4 || $3 >= 20' emp.data
Beth 4.00 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18

如果不加逻辑操作符号,则符合条件的语句会重复输出

‘$2 >= 4 || $3 >= 20’ 等价于 !($2 < 4 && $3 < 20)

Data Validation Awk 是一款很优秀的数据校验工具

1
2
3
4
# 打印 field 不等于 3 的行
awk 'NF != 3 {print $0, "number of fields is not equal to 3"}' emp.data
# 打印时薪小于 3.35 的行
awk '$2 < 3.35 { print $0, "rate is below minimum wage" }' emp.data

BEGIN and END 类似 before/after class 的操作

1
2
3
4
5
6
7
8
9
awk 'BEGIN { print "NAME   RATE    HOURS"; print ""} { print }' emp.data 
NAME RATE HOURS

Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18

Computing with AWK

pattern { action } 中的 action 是一系列用换行或冒号分割的语句。这节介绍一些字符,数字操作,一些内置变量和自定义变量。自定义变量不需要声明。

Counting 统计时长大于 15 的用户

1
2
awk ' $3 > 15 { emp++; } END{ print emp, "employees worked more than 15 hours" }' emp.data 
3 employees worked more than 15 hours

Computing Sums and Averages 计算总值和平均值

1
2
3
4
5
6
7
8
9
10
11
12
awk 'END { print NR, "employees" }' emp.data                                              
6 employees

awk '{ pay = pay + $2 * $3 }
END {
print NR, "employees"
print "total pay is", pay
print "average pay is", pay/NR
}' emp.data
6 employees
total pay is 337.5
average pay is 56.25

Handling Text Awk 有处理文字的能力, awk 中的变量可以持有数字和字符串,下面的例子显示报酬最多的用户

1
2
3
4
5
awk '                       
$2 > maxrate { maxrate = $2; maxemp = $1 }
END { print "highest hourly rate:", maxrate, "for", maxemp }
' emp.data
highest hourly rate: 5.50 for Mary

可以看出来,默认的变量初始值为 0

String Concatenation 字符拼接

1
2
3
awk '{ names = names $1 "  " }
END { print names }' emp.data
Beth Dan Kathy Mark Mary Susie

names 变量的初始值为 null

Printing the Last Input Line 打印最后一行

1
2
awk '{ last = $0 } END { print last }' emp.data 
Susie 4.25 18

Built-in Functions awk 提供了很多内置函数,比如 length 计算字符串长度

1
2
3
4
5
6
7
awk '{ print $1, length($1) }' emp.data 
Beth 4
Dan 3
Kathy 5
Mark 4
Mary 4
Susie 5

Counting Lines, Words and Characters 通过使用 length,NF, NR 统计行基本信息,为了便于计算,我们将每一个 field 都当作 String 对待

1
2
3
4
5
6
awk '{                                 
nc = nc + length($0) + 1
nw = nw + NF
}
END { print NR, "lines,", nw, "words,", nc, "characters"}' emp.data
6 lines, 18 words, 88 characters

nc = nc + length($0) + 1 1 代表换行符

Control-Flow Statemnets

awk 中的流程控制和 C 语言中基本一直,这些控制语句只能用在 action 中

If-Else Statement

统计时薪大于 6 的所有人的总收入及平均收入, 通过 if-else 控制打印的 loop

1
2
3
4
5
6
7
8
9
awk '    
$2 > 6 { n = n+1; pay = pay+$2*$3 }
END {
if (n > 0)
print n, "employees, total pay is", pay, "average pay is", pay/n
else
print "no employees are paid more than $6/hour"
}' emp.data
no employees are paid more than $6/hour

While Statement

while = condition + body. 下面实现一个计算存款的功能,表达式可以概括为 value = amount (1 + rate)years

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cat interest1 
# interest1 - compute compound interest
# input: amount rate years
# output: compounded value at the end of each year

{
i = 1
while (i <= $3) {
printf("\t%.2f\n", $1 * (1 + $2) ^ i)
i = i + 1
}
}

awk -f interest1
1000 .06 5
1060.00
1123.60
1191.02
1262.48
1338.23

For Statement

同样的计算,用 for 实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat interest2  
# interest2 - compute compound interest
# input: amount rate years
# output: compounded value at the end of each year

{
for (i=1; i<=$3; i++)
printf("\t%.2f\n", $1 * (1+$2) ^ i)
}

awk -f interest2
1000 .06 5
1060.00
1123.60
1191.02
1262.48
1338.23

Arrays

awk 支持数组。下面的实验中,我们在 action 中将行信息存到数组中,在 END 中通过 while 倒序输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
awk '           
{ line[NR] = $0 }
END {
i = NR
while (i>0) {
print line[i]
i = i-1
}
}' emp.data
Susie 4.25 18
Mary 5.50 22
Mark 5.00 20
Kathy 4.00 10
Dan 3.75 0
Beth 4.00 0

A Handful of Useful “One-liners”

摘录一些简短但是令人印象深刻的 awk 脚本

  • print the total number of input lines awk 'END { print NR }' emp.data
  • 打印第三行 awk 'NR == 3' emp.data
  • 打印每行最后一个 field awk '{ print $NF }' emp.data
  • 打印最后一行的最后一个 field awk '{ field = $NF } END { print field }' emp.data
  • 打印 field 数量大于 4 的行 awk 'NF > 4' emp.data
  • 打印最后一个 field 大于 4 的行 awk '$NF > 4' emp.data
  • 用行号代替第一个 field awk '{ $1 = NR; print }' emp.data
  • 抹去第二个 field awk '{ $2=""; print }' emp.data
  • 倒序打印每一行 awk '{for(i=NF; i>0;i--) printf("%s", $i); printf("\n")}' emp.data

视频练习

Tomcat 安装

  • 去官网 Tomcat 选择版本下载
  • 解压后,去到目录的 bin 文件夹下,点击 startup.bat 启动服务器
  • 访问 localhost:8080 看到网页,安装成功
  • 点击 shutdown.bat 或关闭终端,停止程序

config -> server.xml 是只要的配置文件,可以配置端口号(Connector),域名(Host)等信息

webapps 下面的每一个文件夹都是一个 project,可以再地址后面直接跟 project 名字访问,比如 http://localhost:8080/docs

提问

如果 Host 域名改成其他的值,比如 www.abc.com 之后,直接再 browser 输入地址还能访问到该网页吗

不能。需要该本机 hosts 文件。browser 访问地址时,先查看本机 hosts 文件配置,如果有匹配的,直接返回,没有再联网到 DNS 做请求。DNS 那边肯定没有配置这个地址的,配置了也不是你想要的。

山寨一个 ROOT

复制 webapps 下的 ROOT 重命名成 myproject, 删除所有内容,只保留 WEB-INF 文件夹。修改这个文件夹里面的 web.xml 内容,只需要保留 web-app 这个 node 里面的内容,中间节点可以删除。在 web-inf 同级目录下添加新的 index.html 页面,内容为简单的 Hello world. 启动 tomcat,访问 http://localhost:8080/myproject 可以看到新页面

1
2
3
4
5
6
7
8
9
10
11
12
13
<!DOCTYPE html>
<html>

<head>
<meta charset="UTF-8">
<title>myproject</title>
</head>

<body>
<h1>Wryyyyy.....</h1>
</body>

</html>

myproject_index

项目目录结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
--webapps: Tomcat服务器的 web 目录
-ROOT
-myproject: 网站的目录名
-WEB-INF
-classes:Java 程序
-lib: web应用所以来的jar包
-web.xml: 网站配置文件
-index.html: 默认的首页
-static
-css
-style.css
-js
-img
-...

Servlet

  • Servlet 是 SUN 公司开发动态 web 的一门技术
  • Sun 在这些 API 中提供了一个接口叫做 Servlet,开发只需两部分
    • 编写一个类实现 Servlet 接口
    • 把开发好的 Java 类部署到 web 服务器中

实现了 Servlet 接口的 Java 程序叫做 Servlet

HelloServlet

目标:

  1. 建立起基本的实验架构
  2. 运行第一个程序

创建一个 Maven 项目作为综合项目 javaweb,将 src 文件夹删掉,后续的实验通过创建 module 的形式进行。这是一个空的项目,在父目录的 pom 文件中加入基本依赖

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<dependencies>
<!-- https://mvnrepository.com/artifact/javax.servlet/javax.servlet-api -->
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>javax.servlet-api</artifactId>
<version>4.0.1</version>
<scope>provided</scope>
</dependency>

<!-- https://mvnrepository.com/artifact/javax.servlet.jsp/javax.servlet.jsp-api -->
<dependency>
<groupId>javax.servlet.jsp</groupId>
<artifactId>javax.servlet.jsp-api</artifactId>
<version>2.3.3</version>
<scope>provided</scope>
</dependency>

</dependencies>

新建 module 选择 Maven 项目 -> Create from artchetype -> maven-archtype-webapp 后面再填写一写基本的项目信息即可。

默认根据模板创建出来的项目是没有 Java 等基本目录的需要自己创建. 期望目录如下

1
2
3
4
main
├─java
├─webapp
└─resources

建完 Java 文件夹后还要右键 mark as source root, 不然不能添加 Java class 文件。

创建完成后观察父子项目的 pom 文件可以发现,两个文件中有互相的引用

在 java 文件夹下创建 HelloServlet 并继承自 HttpServlet, 重写 doGet 方法

1
2
3
4
5
6
 @Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
System.out.println("Into HelloServlet...");
PrintWriter writer = resp.getWriter();
writer.print("Hello from servlet!");
}

我们写的是 Java code server 是不识别的,所以要再 web-inf 下的 web.xml 中配置 mapping 关系,使得 url 访问对应地址时触发我们写的 Java 代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
http://xmlns.jcp.org/xml/ns/javaee/web-app_4_0.xsd"
version="4.0"
metadata-complete="true">
<!--注册 servlet -->
<servlet>
<servlet-name>hello</servlet-name>
<servlet-class>org.jzeng.servlet.HelloServlet</servlet-class>
</servlet>
<!--servlet 的请求路径-->
<servlet-mapping>
<servlet-name>hello</servlet-name>
<url-pattern>/hello</url-pattern>
</servlet-mapping>
</web-app>

代码到这里基本就写完了,接下来是 idea 配置 tomcat

Add Configuration… -> + 号 -> Tomcat Server -> local 选择本地安装的 tomcat

主要查看 Application server, JRE, port 这些信息。可以看到有个 Warning,说是 No artifacts marked for deployment. 直接选择 Deployment tab -> + 号 选择 servlet01:war 即可。貌似选第二个也不会出问题. 点击运行启动程序,会默认弹出浏览器,输入地址 http://localhost:8080/hello 查看结果

servlet01_hello

PS: 这个页面上有个 Application context 默认自带 /servlet01_war 这个值的,有了的话,地址默认要带这个路径的,不想要的话直接用 / 就行

Servlet 类关系图

TODO - 明天用 UML 画一下 servlet 个类关系图,并简要提及一下代码实现

Servlet 的工作原理

TODO

Servlet mapping 的书写

可以是一对一

1
2
3
4
<servlet-mapping>
<servlet-name>hello</servlet-name>
<url-pattern>/hello</url-pattern>
</servlet-mapping>

可以是一对多

1
2
3
4
5
6
7
8
<servlet-mapping>
<servlet-name>hello</servlet-name>
<url-pattern>/hello</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>hello</servlet-name>
<url-pattern>/hello1</url-pattern>
</servlet-mapping>

也支持通配符

1
2
3
4
<servlet-mapping>
<servlet-name>hello</servlet-name>
<url-pattern>*.abc</url-pattern>
</servlet-mapping>

编写 ErrorServlet

新加 class 文件

1
2
3
4
5
6
7
8
9
public class ErrorServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
resp.setContentType("text/html");
resp.setCharacterEncoding("utf-8");
PrintWriter writer = resp.getWriter();
writer.print("<h1>404</h1>");
}
}

对应的 web.xml 中添加内容

1
2
3
4
5
6
7
8
<servlet>
<servlet-name>error</servlet-name>
<servlet-class>org.jzeng.servlet.ErrorServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>error</servlet-name>
<url-pattern>/*</url-pattern>
</servlet-mapping>

PS: * 不能省略斜杠,不然项目启动会失败

ServletContext

ServletContext 提供了一个 servlet 之间通信的媒介,是位于 servlet 之上的

TODO:简单来个图

实际开发中尽量避免直接使用它,下面介绍的方法都有替代方案,数据共享用 session/cookie, context 参数基本不推荐使用了,转发用重定向,properties 使用反射读取。经典白学。。。只做引子

数据共享

目标:

  • 创建第二个 module 实践 ServletContext 实现 servlet 之间的数据共享

实验描述:在 module2 中新建两个 servlet, 分别向 ServletContext 中 set 值和从 ServletContext 中 get 值。然后这两个 servlet 配置到 web.xml 中。配置 tomcat 启动项,将 deployment 下原本的 module1 配置移除,不然两个都会打包,会变慢。启动后,浏览器先访问 localhost:8080/setattr 再访问 localhost:8080/getattr 就可以看到之前 set 的属性被正确获取

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
public class SetAttrServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
ServletContext context = this.getServletContext();
context.setAttribute("name", "jack");
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}

public class GetAttrServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
ServletContext context = this.getServletContext();
String name = (String)context.getAttribute("name");

resp.setContentType("text/html");
resp.setCharacterEncoding("utf-8");
resp.getWriter().print("my name: " + name);
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}

获取初始化参数

目标:体验 context param 的使用,实际中基本不使用这种方式

描述:在 web.xml 中添加 context param 标签并注册,新写一个 servlet 拿到这个 param 并输出到页面

1
2
3
4
5
6
7
8
9
10
11
12
13
public class GetContextParamServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
ServletContext context = this.getServletContext();
String name = context.getInitParameter("name");
resp.getWriter().print("manga name: " + name);
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req,resp);
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
 <!-- context param testing -->
<context-param>
<param-name>name</param-name>
<param-value>jojo</param-value>
</context-param>
<servlet>
<servlet-name>getContextParam</servlet-name>
<servlet-class>org.jzheng.servlet.GetContextParamServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>getContextParam</servlet-name>
<url-pattern>/getcontextparam</url-pattern>
</servlet-mapping>

转发

目标:体验转发特点

描述:新建一个 servlet 将请求转发到上面的参数初始化实验的地址去

1
2
3
4
5
6
7
8
9
10
11
12
13
public class ForwardServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
ServletContext context = this.getServletContext();
// point to target address
context.getRequestDispatcher("/getcontextparam").forward(req, resp);
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}
1
2
3
4
5
6
7
8
9
<!-- forward testing -->
<servlet>
<servlet-name>forward</servlet-name>
<servlet-class>org.jzheng.servlet.ForwardServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>forward</servlet-name>
<url-pattern>/forward</url-pattern>
</servlet-mapping>

重启 tomcat,访问 localhost:8080/forward 显示之前实验的页面

TODO: 补一个转发和重定向的简图

PS:转发的请求,地址不会发生变化

加载 Properties

目标:Servlet 读取 resources 文件夹下的 properties 文件内容

描述:resources 文件夹下新建一个 db.properties 文件,写入测试内容。新建 PropertiesServlet 读取这个文件并将信息显示在页面上

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class PropertiesServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
ServletContext context = this.getServletContext();
InputStream is = context.getResourceAsStream("/WEB-INF/classes/db.properties");
Properties prop = new Properties();
prop.load(is);
String uname = (String) prop.get("username");
String pwd = (String) prop.get("password");
resp.getWriter().print("name: " + uname + "; password: " + pwd);
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}
1
2
3
4
5
6
7
8
9
<!-- properties testing -->
<servlet>
<servlet-name>prop</servlet-name>
<servlet-class>org.jzheng.servlet.PropertiesServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>prop</servlet-name>
<url-pattern>/prop</url-pattern>
</servlet-mapping>

访问 localhost:8080/prop 显示 properties 文件中设置的用户名和密码

prop 不能提取的问题

如果 properties 是写在 Java class 同目录下的,那么,编译的时候,并不会被提取到 WEB-INF 文件夹中去。需要在 module2 的 pom 中添加配置修复

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<build>
<!-- 在 build 的时候将工程中的配置文件也一并 copy 到编译文件中,即 target 文件夹下 -->
<resources>
<resource>
<directory>src/main/resources</directory>
<includes>
<include>**/*.properties</include>
<include>**/*.xml</include>
</includes>
</resource>
<resource>
<directory>src/main/java</directory>
<includes>
<include>**/*.properties</include>
<include>**/*.xml</include>
</includes>
<filtering>true</filtering>
</resource>
</resources>
</build>

HttpServletResponse

web 服务器接收到客户端的 Http 请求,会封装两个对象 HttpServletReqeust 代表请求,HttpServletResponse 代表响应

设置 reponse 自动下载

目标:通过设置 response 头信息,实现发送请求后,下载文件的效果

描述:新建 servlet, 设置响应头包含 response.setHeader("Content-Disposition", "attachment;filename="+filename); 即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class ResponseServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
// URL url = getClass().getClassLoader().getResource("tree.png");
String filePath = "C:\\Users\\jack\\IdeaProjects\\javaweb\\response\\src\\main\\resources\\tree.png";
String fileName = filePath.substring(filePath.lastIndexOf("//") + 1);
resp.setHeader("Content-Disposition", "attachment;filename=" + fileName);
byte[] buf= new byte[1024];
int len = 0;
InputStream is = new FileInputStream(filePath);
OutputStream os = resp.getOutputStream();
while((len=is.read(buf))>0) {
os.write(buf, 0, len);
}
os.close();
is.close();
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}

设置 response 自动刷新

这个实验用到的技术不实用了,但是他最后的效果我挺喜欢的,还是手动撸一遍玩一下

目标:页面显示一个定时刷新的数字验证码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
public class ImageServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
// auto refresh
resp.setHeader("refresh", "3");
// create image
BufferedImage image = new BufferedImage(80, 20, BufferedImage.TYPE_INT_RGB);
Graphics2D g = (Graphics2D)image.getGraphics();
g.setColor(Color.white);
g.fillRect(0, 0, 80, 20);
g.setColor(Color.BLUE);
g.setFont(new Font(null, Font.BOLD, 20));
g.drawString(makeNum(), 0, 20);

resp.setContentType("image/jpeg");
// no cache
resp.setDateHeader("expires", -1);
resp.setHeader("Cache-Control", "no-cache");
resp.setHeader("Pragma", "no-cache");
ImageIO.write(image, "jpg", resp.getOutputStream());
}

private String makeNum() {
Random random = new Random();
String num = random.nextInt(9999999) + "";
StringBuffer sb = new StringBuffer();
for (int i=0; i<7-num.length(); i++) {
sb.append("0");
}
num = sb.toString() + num;
return num;
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}
1
2
3
4
5
6
7
8
<servlet>
<servlet-name>imageServlet</servlet-name>
<servlet-class>com.jzheng.servlet.ImageServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>imageServlet</servlet-name>
<url-pattern>/image</url-pattern>
</servlet-mapping>

实现重定向

redirect 和 forward 的区别: redirect url 会变, 状态码 302, forward 不会,状态码 200

实验目标:体验一下 redirect 和 jsp

步骤描述:首页新建一个表单,同时新建一个 RequestServlet 作为表单的提交地址。设置表单 action 属性指向这个 servlet。servlet 的末尾添加 redirect 的逻辑指向 success.jsp

新表单

1
2
3
4
5
6
7
8
9
10
11
<html>
<body>
<h2>Hello World!</h2>
<%--${pageContext.request.contextPath} for project --%>
<form action="${pageContext.request.contextPath}/login" method="get">
name: <input type="text" name="username"><br>
pwd: <input type="password" name="password"><br>
<input type="submit">
</form>
</body>
</html>
1
2
3
4
5
6
7
8
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
System.out.println("in to request servlet");
String uname = req.getParameter("username");
String pwd = req.getParameter("password");
System.out.println(uname + ";" + pwd);
resp.sendRedirect("/success.jsp");
}

重定向 jsp

1
2
3
4
5
6
<html>
<body>
<h2>Success!</h2>
</form>
</body>
</html>

别忘了在 web.xml 那边注册新加的 servlet

1
2
3
4
5
6
7
8
<servlet>
<servlet-name>requestServlet</servlet-name>
<servlet-class>com.jzheng.servlet.RequestServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>requestServlet</servlet-name>
<url-pattern>/login</url-pattern>
</servlet-mapping>

Cookies

会话: 可以简单的理解为从打开浏览器访问页面到关闭浏览器,这一段时间内,浏览器和服务器之间的通信关系

有状态的会话: 需要通过 session 或者 cookie 记录这个状态。cookie 记录在客户端,session 记录在服务器端

Cookies 实验01

目的:通过在 response 中设置 cookie 的方式记录客户端访问时间

步骤:新建 servlet,并在处理 request 的时候,在对应的 response 中返回当前时间。如果是第一次访问,则打印:这是第一次访问

实现:

创建 servlet,接收 req 并检查其中的 cookies 如果没有 lastlogin 相关的 cookie 则打印 第一次访问,有则打印上次访问时间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
public class CookieServlet01 extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
req.setCharacterEncoding("utf-8");
resp.setCharacterEncoding("utf-8");

// loop cookies and get login cookie if exist
Cookie loginCookie = null;
Cookie[] cookies = req.getCookies();
for (Cookie cookie : cookies) {
if (cookie.getName().equals("lastlogin")) {
loginCookie = cookie;
}
}

// if it's first time login, print log. else print last login time
if (loginCookie == null) {
resp.getWriter().print("it's the first time to login...");
} else {
String strDateFormat = "yyyy-MM-dd HH:mm:ss";
SimpleDateFormat sdf = new SimpleDateFormat(strDateFormat);
resp.getWriter().print("last login time: " + sdf.format(new Date(Long.parseLong(loginCookie.getValue()))));
}

// update login cookie
Cookie updateLoginTime = new Cookie("lastlogin", System.currentTimeMillis()+"");
resp.addCookie(updateLoginTime);
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}

配置 web.xml

1
2
3
4
5
6
7
8
<servlet>
<servlet-name>cookie01</servlet-name>
<servlet-class>com.jzheng.CookieServlet01</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>cookie01</servlet-name>
<url-pattern>/c1</url-pattern>
</servlet-mapping>

启动 tomcat 第一次访问数据展示. 显示第一次登录信息,访问后,servlet 会向 resp 中插入 login cookie 的信息

cookie_c1

第二次访问数据展示,时间显示为上次访问 servlet 的时间了。

cookie_c2

观察 Network 中的 response 也能发现一些有趣的信息,他会显示默认的 cookie 有效时间 20mins, 还会显示 resp 中为 cookie 设置了值

cookie_c1_2

Cookies 实验02

关闭浏览器,再访问这个网址的时候,都会显示第一次登录(IE) 那么怎么为他设置一个有效期限呢,可以通过 maxAge 属性. 设置后可以看到 resp 中多了过期时间的属性,关闭浏览器再登录还是可以看到时间

cookie_c3

  • 一个 cookie 只能保存一个信息
  • 一个 web 站点可以给浏览器发送多个 cookie, 最多存放 20 个 cookie
  • cookie 大小有限制 4kb
  • 浏览器的 cookie 上限时 300 个
  1. 不设置有效期,删除自动清理
  2. 设置有效期为 0, updateLoginTime.setMaxAge(0); 访问 c1 后访问 c2 可以看到控制台中 login 的 cookie 删掉了

问题

servlet 中尝试使用 cookie.getName() == “xxx” 的表示,即使 name 和 xxx 值时一样的还是会判 false, 为什么?

我猜测,可能 name 时通过 new String() 的方式生成的,具体得看底层实现,测试一下

既然 session 时 server 和 client 之间的对话,那多 server 的情况下,这个 session 时怎么维护的?听 yi 的说法,我司貌似时登录的时候寻在对应的 server, 并不是存在公共的地方的

Session

什么是 Session:

  • 服务器会给每个用户(浏览器)创建一个 Session 对象
  • 一个 Session 独占一个浏览器,只要浏览器没关闭,这个 session 就存在
  • 用户登录之后,整个网站都可以访问;保存用户、购物车的信息

Session 和 cookie 的区别

  • cookie 把用户数据写到浏览器端保存
  • session 把数据写到用户独占的 session 中,保存在 server 端(保存重要数据,避免资源浪费)
  • session 由服务创建

使用场景:

  • 保存一个登录用户的信息
  • 购物车信息
  • 整个网站中经常使用的数据

TODO:画个图

思考:我是不是可以通过拿到用户的 session id 来 hack 进系统?

比 session 还高一层的变量叫 ServletContext, JSP 中交 ApplicationContext

实验 01

测试 session 的生命周期。session 是当你打开网页的时候就会生成的一个变量。实验中,我们在 servlet 的 req 对象中取得 session 对象,并判断是否存在,并打印 log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class SessionServlet01 extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
// set encoding
req.setCharacterEncoding("UTF-8");
resp.setCharacterEncoding("UTF-8");
resp.setContentType("text/html;charset=utf-8");

// get session and check
HttpSession session = req.getSession();
session.setAttribute("name", "jack");
if (session.isNew()) {
resp.getWriter().write("session is new: " + session.getId());
} else {
resp.getWriter().write("session already exist: " + session.getId());
}
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}
1
2
3
4
5
6
7
8
<servlet>
<servlet-name>session01</servlet-name>
<servlet-class>com.jzheng.SessionServlet01</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>session01</servlet-name>
<url-pattern>/s1</url-pattern>
</servlet-mapping>

启动 tomcat,自动弹出首页,这时 session 已经建立,再访问 s1 显示已经存在。查看 Network reqeust 和 Application 的 cookie 信息可以看到,打印的 session id 和 request/cookie 中的 jsession id 的对应的

PS:如果新启动一个 browser,直接访问 s1 会显示是新 session 的

PPS: 在 web 的实现中,它会将 session id 塞到 cookie 的 jsession 中(貌似没有代码体现)

实验02

新建一个 servlet 获取上面实验中塞的值,并打印

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public class SessionServlet02 extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
// set encoding
req.setCharacterEncoding("UTF-8");
resp.setCharacterEncoding("UTF-8");
resp.setContentType("text/html;charset=utf-8");

// get session and check
HttpSession session = req.getSession();

System.out.println("session attribute name, value is: " + session.getAttribute("name"));
}
}

再配置 web.xml,启动 tomcat,先访问 s1 再访问 s2 可以看到终端答应 name log

实验03

通过 session 实现对象的存储

新建一个 person 类

1
2
3
4
5
public class Person {
private String name;
private int age;
// constrctor + getter/setter + toString
}

将 SessionServlet01 中设置 name 的语句改为设置对象 session.setAttribute("name", new Person("jack", 2));

启动 tomcat 访问 s1 再访问 s2 终端答应对象信息 session attribute name, value is: Person{name='jack', age=2}

实验04

注销 session,可以直接 invalid 的方式注销

1
2
3
4
5
6
7
8
public class SessionServlet03 extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
HttpSession session = req.getSession();
session.removeAttribute("name");
session.invalidate();
}
}

update web.xml 设置到 s3 这个节点。启动 tomcat,访问 s1 然后访问 s3 再访问 s1 发现 session id 变了

除了上面的方式还可以通过配置 web.xml 中的 session-config 达到目的

1
2
3
4
<session-config>
<!-- n minutes -->
<session-timeout>1</session-timeout>
</session-config>

启动 tomcat 访问 s1 然后等一分钟再刷新,发现 id 改变

JSP

JSP 是 Java server pages 的简写,和 servlet 一样,用于动态 web 技术

最大的特点是:写 JSP 就像写 HTML 一样

区别:

  • Html 只给用户提供静态的数据
  • JSP 页面中可以嵌入 Java 代码,提供动态数据

JSP 原理

思路:JSP 怎么执行的?

新建一个 jsp-investigation project 举例。我当前的实验环境是 Mac + idea 社区版 + smart tomcat,启动项目后可以在 /Users/myname/.SmartTomcat/javaweb/jsp-investigation/work/Catalina/localhost/jsp-investigation/ 下看到对应的 jsp 转化之后的 Java 文件。当访问 jsp 文件时才会动态生成。

浏览器向服务器发送请求,不管访问什么资源,其实都是在访问 Servlet,JSP 最终也是转化为 servlet

所以大致流程可以表示为 用户 -> servet -> jsp -> (谁做的转化,tomcat 还是 servlet?)java -> class -> html -> return -> user 这么一个过程

TODO 图解

JSP 页面中, Java代码原封不动的输出,HTML 代码就会转化成 out.write("xxx") 的形式输出

下面是自带的 index.jsp 翻译后的 Java 文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// comment + package import

// index_jsp 继承自 HttpJspBase, 再查看他的继承关系 HttpJspBase extends HttpServlet implements HttpJspPage,可以看出来,这个 HttpJspBase 本质还是一个 servlet
// 拿到 request + response 处理
public final class index_jsp extends org.apache.jasper.runtime.HttpJspBase
implements org.apache.jasper.runtime.JspSourceDependent,
org.apache.jasper.runtime.JspSourceImports {
// 移除一些变量声明方法。。。

// 三个主体方法,init + destory + service, service 包含主要转化过程
public void _jspInit() {
}

public void _jspDestroy() {
}

public void _jspService(final javax.servlet.http.HttpServletRequest request, final javax.servlet.http.HttpServletResponse response)
throws java.io.IOException, javax.servlet.ServletException {

// request type 检测
if (!javax.servlet.DispatcherType.ERROR.equals(request.getDispatcherType())) {
final java.lang.String _jspx_method = request.getMethod();
if ("OPTIONS".equals(_jspx_method)) {
response.setHeader("Allow","GET, HEAD, POST, OPTIONS");
return;
}
if (!"GET".equals(_jspx_method) && !"POST".equals(_jspx_method) && !"HEAD".equals(_jspx_method)) {
response.setHeader("Allow","GET, HEAD, POST, OPTIONS");
response.sendError(HttpServletResponse.SC_METHOD_NOT_ALLOWED, "JSPs only permit GET, POST or HEAD. Jasper also permits OPTIONS");
return;
}
}

// 声明一些内置变量
final javax.servlet.jsp.PageContext pageContext;
javax.servlet.http.HttpSession session = null;
// servlet context 命名为 application
final javax.servlet.ServletContext application;
final javax.servlet.ServletConfig config;
javax.servlet.jsp.JspWriter out = null;
final java.lang.Object page = this;
javax.servlet.jsp.JspWriter _jspx_out = null;
javax.servlet.jsp.PageContext _jspx_page_context = null;


// 移除一场处理,输出页面内容
response.setContentType("text/html");
pageContext = _jspxFactory.getPageContext(this, request, response,
null, true, 8192, true);
_jspx_page_context = pageContext;
application = pageContext.getServletContext();
config = pageContext.getServletConfig();
session = pageContext.getSession();
out = pageContext.getOut();
_jspx_out = out;

out.write("<html>\n");
out.write("<body>\n");
out.write("<h2>Hello World!</h2>\n");
out.write("</body>\n");
out.write("</html>\n");
}
}

JSP 基础语法

了解即可。新建普通 maven 项目,右键 module -> Add Framework Support -> Web Application。通过这种方式创建的 web 项目,更新的时候,有热更新的效果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
<!-- jsp 表达式 -->
<%= new java.util.Date() %>

<!-- 脚本片段 -->
<%
int sum = 0;
for (int i = 0; i < 100; i++) {
sum+=i;
}
out.println("result = " + sum);
%>

<!-- 片段中间嵌入 html -->
<%
int x = 10;
out.print("x = " + x);
%>
<p> 这是片段分割 </p>
<%
int y = 10;
out.print("y = " + y);
%>

<!-- JSP 批量生产网页元素 -->
<% for (int i = 0; i < 3; i++) { %>
<h1>Hello <%=i%></h1>
<% } %>

<!-- jsp 声明,声明的内容会放在类中,其他的代码段则会生成在 _jspService 方法中 -->
<%!
static {
System.out.println("Loading servlet!");
}
private int globalVar = 0;

public void test() {
System.out.println("into method test...");
}
%>

<!-- HTML COMMENT -->
<%!-- JSP COMMENT --%>
jsp 的注释并不会在页面源代码中显示,html 可以

和 index.jsp 同级目录下新建一个页面 jsp2.jsp 并在 body 中写一个错误代码片段 <% int x=1/0; %> 访问 http://localhost:8080/jsp2.jsp 可以看到页面抛出异常

这个处理不是很好,可以在这个页面的头部添加 <%@ page errorPage="error/500.jsp" %> 指定错误页面

这个页面还可以配置错误的图片,不过本地测试的时候需要重启 idea 才能看到,不然图片是损坏状态

除了上面的配置办法,还可以在 web.xml 中设置这些页面

1
2
3
4
5
6
7
8
9
<error-page>
<error-code>404</error-code>
<location>/error/404.jsp</location>
</error-page>

<error-page>
<error-code>500</error-code>
<location>/error/500.jsp</location>
</error-page>

随便访问一个不存在的页面即可得到 404 error page, http://localhost:8080/jsp2asdfasd.jsp

include 标签

1
2
3
<%@include file="common/header.jsp"%>
<h> 我是身体 </h>
<%@include file="common/footer.jsp"%>

也可以使用标签的形式,效果一样

1
2
3
<jsp:include page="/common/header.jsp"/>
<h> 我是身体 </h>
<jsp:include page="/common/footer.jsp"/>

区别:第一种将页面源码包含在 class 文件中,第二种是通过静态方法引入页面

九大内置对象

  • PageContext 存东西
  • Request 存东西
  • Response
  • Session 存东西
  • Application - ServletContext 存东西
  • config - ServletConfig
  • out
  • page
  • exception

实验01 测试内置对象作用域

新建一个 jsp 页面再里面通过内置的对象设置值,并通过 pageContext.findAttribute 查找对应的值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<html>
<head>
<title>demo01</title>
</head>
<body>

<%
pageContext.setAttribute("name1", "val1"); // 一个页面中有效
request.setAttribute("name2", "val2"); // 一次请求中有效
session.setAttribute("name3", "val3"); // 一次会话中有效
application.setAttribute("name4", "val4"); // 服务器工作时一直有效

String n1 = (String)pageContext.findAttribute("name1");
%>

<h1>取得值为</h1>
<h3>s1: ${name1}</h3>
<h3>s1<%=n1%></h3>
<h3>${name2}</h3>
<h3>${name3}</h3>
<h3>${name4}</h3>
<h3>${name5}</h3>

</body>
</html>

不得不说,这里视频教学有问题,很大的误导了我,查了半天,还是通过其他看这个视频代码的人的 project 才找到根源。如果你使用 EL 表达式,就不需要使用 pageContext.findAttribute 方法了,拿就行。如果用的是 <%=%> 这种方式才需要使用前面提到的方式拿值。

如果使用 EL 表达式,值为 null 则页面不显示,如果用 <%=%> 值为空则页面显示 null 字样

实验02

新建一个 jsp page, get 语句同上,先访问上面的页面再访问这个新页面,只有 session 和 application level 的变量值可以显示

实验03

pageContext.setAttribute(key, value, scope) 支持直接设置作用域

JSP 标签

EL表达式:${}

  • 获取数据
  • 执行运算
  • 获取 web 开发的常用对象

实验01 EL 标签转发

新建 jsptag.jsp 通过 jsp:forward + param 将参数转发给 jsptag2.jsp 并再页面上显示

1
2
3
4
<jsp:forward page="/jsptag2.jsp">
<jsp:param name="name" value="jack"/>
<jsp:param name="age" value="100"/>
</jsp:forward>

显示页面代码

1
2
3
4
5
name: <%=request.getParameter("name")%><br>
age: <%=request.getParameter("age")%><br>

name: ${param.get("name")}<br>
age: ${param.get("age")}<br>

JSTL

JSTL 时为了弥补 HTML 标签的不足

  • 引入 taglib
  • 使用标签

实验时抛异常 org.apache.jasper.JasperException: 无法在web.xml或使用此应用程序部署的jar文件中解析绝对uri:[http://java.sun.com/jsp/jstl/core]

需要将

  • jstl-1.2.jar
  • standard-1.1.2.jar
  • jstl-impl-1.2.jar
  • jstl-api-1.2.jar

拷贝到 tomcat 的 lib 文件夹中即可,需要重启 tomcat

介绍了 if, when 和 foreach 语法

java bean

实体类

Java bean 的特定写法

  • 必须有无参构造
  • 属性私有化
  • 必须有对应的 get/set 方法

一般用来和数据库字段做映射 ORM

ORM:对象关系映射

  • 表 - 类
  • 字段 - 属性
  • 行 - 对象

MVC三层架构

  • Controller: 接收请求;交给业务层处理对应的代码;控制视图跳转
  • View: 展示数据,提供链接发起 servlet 请求
  • Model: 对应 service + Dao 部分

过滤器 Filter

为某种特殊的需求提供同意的处理方式

  1. 配置依赖
  2. 实现 Filter 接口
  3. 再 web.xml 中配置 filter 参数

实验描述:新建一个 servlet 类,返回中文内容。新建一个 filter 实现类提供 utf-8 转码。在 web.xml 中为 servlet 配置两个入口,一个入口经过 filter,另一个不经过。得到的结果,经过 filter 的中文能正常显示,没有经过的为乱码

新建 servlet

1
2
3
4
5
6
7
8
9
10
11
public class Show extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
resp.getWriter().print("你好,世界");
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}

配置 web.xml

1
2
3
4
5
6
7
8
9
10
11
12
<servlet>
<servlet-name>Show</servlet-name>
<servlet-class>org.jzheng.servlet.Show</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>Show</servlet-name>
<url-pattern>/show</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>Show</servlet-name>
<url-pattern>/filter/show</url-pattern>
</servlet-mapping>

两个地址都能访问且给出乱码。添加 filter 实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public class EncodingFilter implements Filter {
@Override
public void init(FilterConfig filterConfig) throws ServletException {
System.out.println("filter init...");
}

@Override
public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException {
servletResponse.setCharacterEncoding("UTF-8");
servletResponse.setContentType("text/html;charset=UTF-8");
filterChain.doFilter(servletRequest, servletResponse);
}

@Override
public void destroy() {
System.out.println("filter destroy...");
}
}

配置 web.xml

1
2
3
4
5
6
7
8
<filter>
<filter-name>EncodingFilter</filter-name>
<filter-class>org.jzheng.servlet.EncodingFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>EncodingFilter</filter-name>
<url-pattern>/filter/*</url-pattern>
</filter-mapping>

重启服务器,访问 /show 还是乱码,访问 /filter/show 中文显示正常

PS: 从 server 的 log 可以看出 filter 在 server 启动时做一次 init, server 停止时做一次销毁

监听器 Listener

实验描述:新建一个监听器统计在线人数

  1. 实现监听器接口
  2. 注册到 web.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public class CountListener implements HttpSessionListener {
@Override
public void sessionCreated(HttpSessionEvent httpSessionEvent) {
ServletContext context = httpSessionEvent.getSession().getServletContext();
Integer onlineCount = (Integer) context.getAttribute("OnlineCount");
if (onlineCount == null) {
onlineCount = 1;
} else {
onlineCount += 1;
}
context.setAttribute("OnlineCount", onlineCount );
}

@Override
public void sessionDestroyed(HttpSessionEvent httpSessionEvent) {
ServletContext context = httpSessionEvent.getSession().getServletContext();
Integer onlineCount = (Integer) context.getAttribute("OnlineCount");
if (onlineCount == null) {
onlineCount = 0;
} else {
onlineCount -= 1;
}
context.setAttribute("OnlineCount", onlineCount );
}
}
1
2
3
<listener>
<listener-class>org.jzheng.servlet.CountListener</listener-class>
</listener>

restart 之后显示 3 个人,貌似时因为服务器默认会启动几个 session,redeploy 之后修复了。多个几个不同类型的浏览器,session 数量会上升

练习

使用过滤器做一个权限拦截。管理员登录后将信息存到 session 中,注销后从 session 中移除,如果没有登录则无法访问成功页面

新建登录界面 jsp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<html>
<head>
<title>login</title>
</head>
<body>

<h1> 登录 </h1>

<form action="/servlet/login" method="post">
<input type="text" name="username">
<input type="submit">
</form>

</body>
</html>

创建 login 对应的 servlet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class LoginServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
String username = req.getParameter("username");
if (username.equals("admin")) {
req.getSession().setAttribute(Constant.USER_SESSION, req.getSession().getId());
resp.sendRedirect("/sys/success.jsp");
} else {
resp.sendRedirect("/error.jsp");
}
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}

配置 web.xml

1
2
3
4
5
6
7
8
<servlet>
<servlet-name>LoginServlet</servlet-name>
<servlet-class>org.jzheng.servlet.LoginServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>LoginServlet</servlet-name>
<url-pattern>/servlet/login</url-pattern>
</servlet-mapping>

创建登录失败页面

1
2
3
4
5
6
7
8
9
<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<html>
<head>
<title>error</title>
</head>
<body>
<h1> 登录失败 </h1>
</body>
</html>

启动测试,访问 localhost:8080/login.jsp 输入 admin 成功登录,输入其他内容,登录失败跳转到 error.jsp

完善流程,创建 logout 并移除 session 属性的操作. 在 success 页面添加 logout 超链接

1
2
3
4
5
6
7
8
9
10
11
<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<html>
<head>
<title>success</title>
</head>
<body>

<h1> 登录成功 </h1>
<p><a href="/servlet/logout">logout</a></p>
</body>
</html>

添加 logout 的 servlet 和 web.xml 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public class LogoutServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
if (req.getSession().getAttribute(Constant.USER_SESSION) != null) {
req.getSession().removeAttribute(Constant.USER_SESSION);
resp.sendRedirect("/login.jsp");
}
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}
1
2
3
4
5
6
7
8
<servlet>
<servlet-name>Logout</servlet-name>
<servlet-class>org.jzheng.servlet.LogoutServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>Logout</servlet-name>
<url-pattern>/servlet/logout</url-pattern>
</servlet-mapping>

上面使用移除属性,而不是 invalid 达到 session 重用的效果。新建 session 是一个比较重的操作

重启之后,admin 登录,点击 logout 跳会到 login 页面。这里如果我们手动访问 sys/success.jsp 还是可以访问到。我们可以通过添加 filter,判断 USER_SESSION 是否为空作为跳转条件

新建 filter 类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class SysFilter implements Filter {
@Override
public void init(FilterConfig filterConfig) throws ServletException {

}

@Override
public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException {
HttpServletRequest req = (HttpServletRequest) servletRequest;
HttpServletResponse resp = (HttpServletResponse) servletResponse;

if (req.getSession().getAttribute(Constant.USER_SESSION) == null) {
resp.sendRedirect("/login.jsp");
}

filterChain.doFilter(req, servletResponse);
}

@Override
public void destroy() {

}
}

添加 web.xml 配置

1
2
3
4
5
6
7
8
<filter>
<filter-name>SysFilter</filter-name>
<filter-class>org.jzheng.servlet.SysFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>SysFilter</filter-name>
<url-pattern>/sys/*</url-pattern>
</filter-mapping>

启动服务器,直接访问 sys/success.jsp 还是显示 login 页面,被阻挡

鸽鸽鸽

DB 以及后续的项目实践部分,暂时用不到,先鸽了。

思考题

就公司需要 refactor 的代码,我有一段时间还想着,能不能把现在用到的从 session 里面拿数据的地方都换成从 request 里面拿。再仔细想一下,貌似不合适。request 的 scope 应该就只能持续到一次访问才对,设计如下的实验验证

  1. 新建 request01,对应 entrypoint r1, 在这个 request 中我们分别 request 和 session 中存储一个变量。
  2. 新建 request02,对应 entrypoint r2,在这个 request 中分别取之前 set 的变量,预期 之前 request 中 set 的变量访问不到,session 中可以
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public class SetVarServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
req.setAttribute("reqVar", "reqVal");
req.getSession().setAttribute("sessionVar", "sessionVal");
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}

public class GetVarServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
resp.getWriter().write("req val: " + req.getAttribute("reqVar")
+ "; session val: " + req.getSession().getAttribute("sessionVar"));
}

@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
doGet(req, resp);
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<servlet>
<servlet-name>setVar</servlet-name>
<servlet-class>com.jzheng.servlet.SetVarServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>setVar</servlet-name>
<url-pattern>/r1</url-pattern>
</servlet-mapping>

<servlet>
<servlet-name>getVar</servlet-name>
<servlet-class>com.jzheng.servlet.GetVarServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>getVar</servlet-name>
<url-pattern>/r2</url-pattern>
</servlet-mapping>

启动 tomcat 后先访问 r1 设置变量,在访问 r2 取得变量,可以看到页面显示如下 req val: null; session val: sessionVal 可以看到,request 中的变量没有拿到,而 session 中可以看到。

session 是保存的从浏览器与服务器建立链接到浏览器关闭的这段时间内的信息,对这个概念感觉理解更充分了一点。

相对于公司的重构项目,这部分应该对应着用户登录到退出之间的操作,登录之后可以进行多个 request 的交互,所以直接变成 request scope 可定是不可行的。

Creating Functions

Basic Script Functions

Creating a function

方式一:

1
2
3
function name {
commands
}

方式二:

1
2
3
name() {
commands
}

Using functions

function 不一定要写在开头,但是你使用之前,他的定义必须已经声明了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cat test1
#!/usr/local/bin/bash
# using a function in a script

function func1 {
echo "This is an example of a function"
}

count=1
while [ $count -le 4 ]
do
func1
count=$[ $count + 1 ]
done

echo "This is the end of the loop"
func1
echo "Now this is the end of the script"

./test1
# This is an example of a function
# This is an example of a function
# This is an example of a function
# This is an example of a function
# This is the end of the loop
# This is an example of a function
# Now this is the end of the script

Returning a Value

bash shell 会在函数结束后给他一个返回值

The default exit status

默认情况下,exit status 是函数中最后一个命令的返回值,可以使用 $? 查看

下面实验中,我们在 func1 中 ls 一个不存在的文件,并打印 exit code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
cat test4
#!/usr/local/bin/bash
# testing the exit status of a function

function func1 {
echo "Trying to display a non-existent file"
ls -l badfile
}

echo "testing the function"
func1
echo "The exit status is: $?"

./test4
# testing the function
# Trying to display a non-existent file
# ls: badfile: No such file or directory
# The exit status is: 1

cat test4b
#!/usr/local/bin/bash
# testing the exit status of a function

function func1 {
ls -l badfile
echo "Trying to display a non-existent file"
}

echo "testing the function"
func1
echo "The exit status is: $?"


./test4b
# testing the function
# ls: badfile: No such file or directory
# Trying to display a non-existent file
# The exit status is: 0

Using the return command

你也可以用 return 关键子在函数末尾给他返回一个整数最 exit code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat test5
#!/usr/local/bin/bash
# using the return command in a function

function db1 {
read -p "Enter a value: " value
echo "doubling the value"
return $[ $value * 2 ]
}

db1
echo "The new value is $?"

./test5
# Enter a value: 10
# doubling the value
# The new value is 20

注意点:

  • 记得在 function 执行完成后立刻取值备用
  • exit code 返回只能在 0-255 之间

Using function output

你也可以将返回值包装在输出里面,通过 result=$(db1) 的形式拿到。和之前调用系统函数一个套路

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat test5b
#!/usr/local/bin/bash
# using the return command in a function

function db1 {
read -p "Enter a value: " value
echo $[ $value * 2 ]
}

result=$(db1)
echo "The new value is $result"

./test5b
# Enter a value: 20
# The new value is 40

Caution 如果函数中有多个 echo 他会将所有的 echo 内容合在一起作为 return 的值

1
2
3
4
5
6
7
8
9
10
11
12
cat freturn.sh
#!/usr/local/bin/bash
# multiple echo sentence in function

function multiecho {
echo "return 1"
echo "return 2"
}

./freturn.sh
# return 1
# return 2

Question 如果我把 echo 和 return 结合使用,他会拿什么当返回值???

结论:return 会被忽略

1
2
3
4
5
6
7
8
9
10
11
12
13
cat ./returnwithecho.sh
#!/usr/local/bin/bash
# multiple echo sentence and return in function

function multiecho {
echo "return 1"
echo "return 2"
return 55
}

./freturn.sh
# return 1
# return 2

Using Variables in Functions

Passing parameters to a function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
cat test6
#!/usr/local/bin/bash
# passing parameters to a function

function addem {
if [ $# -eq 0 ] || [ $# -gt 2 ]
then
echo -1
elif [ $# -eq 1 ]
then
echo $[ $1 + $1 ]
else
echo $[ $1 + $2 ]
fi
}

echo -n "Adding 10 and 15: "
value=$(addem 10 15)
echo $value
echo -n "Let's try adding just one number: "
value=$(addem 10)
echo $value
echo -n "Now trying adding no numbers: "
value=$(addem)
echo $value
echo -n "Finally, try adding three numbers: "
value=$(addem 10 15 20)
echo $value

./test6
# Adding 10 and 15: 25
# Let's try adding just one number: 20
# Now trying adding no numbers: -1
# Finally, try adding three numbers: -1

但是脚本文件中的参数并不会默认的应用到你定义的 function 中去,你需要显示的指定才能使他生效

下面实验中,当我们运行脚本是没有给参数,则直接运行了 else 的路径,如果给了参数,这个参数也不会传递给 badfunc1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
cat badtest1
#!/usr/local/bin/bash
# Trying to access script parameters inside a function

function badfunc1 {
echo $[ $1 * $2 ]
}

if [ $# -eq 2 ]
then
value=$(badfunc1)
echo "The result is $value"
else
echo "Usage: badtest1 a b"
fi

./badtest1
# Usage: badtest1 a b
./badtest1 10 15
# ./badtest1: line 5: * : syntax error: operand expected (error token is "* ")
# The result is

如果要修复上面的脚本,你可以直接在实现中指定 value=$(badfunc1 $1 $2)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cat test7
#!/usr/local/bin/bash
# Trying to access script parameters inside a function

function func7 {
echo $[ $1 * $2 ]
}

if [ $# -eq 2 ]
then
value=$(func7 $1 $2)
echo "The result is $value"
else
echo "Usage: badtest1 a b"
fi

./test7
# Usage: badtest1 a b
./test7 10 15
# The result is 150

Handling variables in a function

Functions 使用两种类型的变量

  • Global
  • Local

Global 变量是在 script 中任何地方都可以访问的变量。在主程序中定义的 global 变量,function 中可以访问到。function 中定义的 global 变量,主程序也能访问到。一般来说你在脚本中定义的变量都是 global 的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat test8
#!/usr/local/bin/bash
# Using a global variable to pass a value

function db1 {
value=$[ $value * 2 ]
}

read -p "Enter a value: " value
db1
echo "The new value is: $value"

./test8
# Enter a value: 10
# The new value is: 20

如果你在 script 和 function 中都用到了同名的变量,就可能导致赋值出问题,是你的程序难以排错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cat badtest2
#!/usr/local/bin/bash
# Demonstrating a db use of variables

function func1 {
temp=$[ $value + 5 ]
result=$[ $temp * 2 ]
}

temp=4
value=6

func1
echo "The result is $result"
if [ $temp -gt $value ]
then
echo "temp is larger"
else
echo "temp is smaller"
fi

./badtest2
# The result is 22
# temp is larger

我们可以使用 local variables 避免这种问题,格式如下 local temp. 他让你的变量只在 function 中生效

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
cat test9
#!/usr/local/bin/bash
# Demonstrating the local keyword

function func1 {
local temp=$[ $value + 5 ]
result=$[ $temp * 2 ]
}

temp=4
value=6

func1
echo "The result is $result"
if [ $temp -gt $value ]
then
echo "temp is larger"
else
echo "temp is smaller"
fi


./test9
# The result is 22
# temp is smaller

Array Variables and Functions

Passing arrays to functions

函数处理数组的方式有点特别,如果你传给函数一个数组,他默认只会取第一个元素作为参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat badtest3
#!/usr/local/bin/bash
# Trying to pass an array variable

function testit {
echo "The parameters are: $@"
thisarray=$1
echo "The received array is ${thisarray[*]}"
}

myarray=(1 2 3 4 5)
echo "The original array is: ${myarray[*]}"
testit $myarray

./badtest3
# The original array is: 1 2 3 4 5
# The parameters are: 1
# The received array is 1

你可以将数组 disassemble 之后传给函数,在使用时在 reassemble 即可. 书上给的例子不能运行,网上找了一个可用的表达式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat test10
#!/usr/local/bin/bash
# array variable to function test

function testit {
local newarray
# 原始表达式为 newarray=(;'echo "$@"')
newarray=($(echo "$@"))
echo "The new array value is ${newarray[*]}"
}

myarray=(1 2 3 4 5)
echo "The original array is: ${myarray[*]}"
testit ${myarray[*]}

./test10
# The original array is: 1 2 3 4 5
# The new array value is 1 2 3 4 5

Return arrays from functions

函数返回一个数组用了同样的技巧,函数中一个一个的 echo 值,接收方需要 reassemble 它们

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
cat test12
#!/usr/local/bin/bash
# returning an array value

function arraydblr {
local origarray
local newarray
local elements
local i
origarray=($(echo "$@"))
newarray=($(echo "$@"))
elements=$[ $# - 1 ]
for (( i=0; i<= $elements; i++ ))
{
newarray[$i]=$[ ${origarray[$i]} * 2 ]
}
echo ${newarray[*]}
}

myarray=(1 2 3 4 5)
echo "The original array is: ${myarray[*]}"
arg1=$(echo ${myarray[*]})
result=($(arraydblr $arg1))
echo "The new array is: ${result[*]}"

./test12
# The original array is: 1 2 3 4 5
# The new array is: 2 4 6 8 10

Function Recursion

local function variable 提供 self-containment 功能。他使得函数可以实现 recursively 的效果

实现斐波那契数列

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat test13
#!/usr/local/bin/bash
# using recursion

function factorial {
if [ $1 -eq 1 ]
then
echo 1
else
local temp=$[ $1 - 1 ]
local result=$(factorial $temp)
echo $[ $result * $1 ]
fi
}

read -p "Enter value: " value
result=$(factorial $value)
echo "The factorial of $value is: $result"

./test13
# Enter value: 5
# The factorial of 5 is: 120

Creating a Library

建立自己的函数库,实现脚本文件中的可复用。简单来说,就是将所有的函数都写在文件中。在其他脚本文件中,通过 . path/to/myfuncs 的语法引入自己的库文件即可。这个点号是 source 的快捷方式叫做 dot operator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
cat myfuncs 
#!/usr/local/bin/bash
# my script functions

function addem {
echo $[ $1 + $2 ]
}

function multem {
echo $[ $1 * $2 ]
}

function divem {
if [ $2 -ne 0 ]
then
echo $[ $1 / $2 ]
else
echo -1
fi
}

cat test14
#!/usr/local/bin/bash
# using functions defined in a library file
. ./myfuncs

value1=10
value2=5
result1=$(addem $value1 $value2)
result2=$(multem $value1 $value2)
result3=$(divem $value1 $value2)
echo "The result of adding them is: $result1"
echo "The result of multiplying them is: $result2"
echo "The result of dividing them is: $result3"

./test14
# The result of adding them is: 15
# The result of multiplying them is: 50
# The result of dividing them is: 2

Using Functions on the Command Line

Creating functions on the command line

方式一:终端一行定义

1
2
3
4
function doubleit { read -p "Enter value:" value; echo $[ $value * 2 ]; }
doubleit
# Enter value:3
6

方式二:多行定义,in this way, no need of semicolon

1
2
3
4
5
function multem {
> echo $[ $1 * $2 ]
> }
multem 2 5
# 10

Caution 如果你终端定义函数的时候和系统自带的函数重名了,那么这个函数会覆盖系统函数。

Defining functions in the .bashrc file

上面的这个方式,当终端退出时,函数就丢失了,你可以将函数写入 .bashrc 文件或使用 source 函数库的方式达到复用的效果。而且最方便的事,如果你将他们通过 bashrc 引入,你在写脚本的时候就不需要 source 了,直接可以调用

Following a Practical Example

这章展示如何使用开源 shell 工具包

  1. wget ftp://ftp.gnu.org/gnu/shtool/shtool-2.0.8.tar.gz 下载实验包并 tar -zxvf shtool-2.0.8.tar.gz 解压
  2. cd 到解压后的文件夹,./configure + make, 当然你也可以用 make test 测试一下构建
  3. make install 安装库文件

shtool 包含了一系列的工具集,可以使用 shtool [options] [fucntion [options] [args]] 来查看

The shtool Library Functions

Function Description
platform Displays the platform identity
Prop Dispalys an animated progress propeller

只列出用到的几个

1
2
3
4
5
6
7
cat test16
#!/usr/local/bin/bash

shtool platform

./test16
# Mac OS X 11.4 (iX86)

带进度条的显示 ls -al /usr/bin | shtool prop -p "waiting..." 太快了,看不出效果

Chapter 18: Writing Scripts for Graphical Desktops

和我这次看书的目标不符,跳过

Chapter 19: Introducing sed and gawk

实际工作中,很多工作都是文字处理相关的。使用 shell 自带工具处理文字会显得很笨拙。这时候就要用到 sed 和 gawk 了。

Manipulating Text

Mac 自带的 sed 工具和书上的是不一样的,好像做了很多裁剪,很多 flag 是不支持的,可以通过 brew 重新装一个

1
2
3
4
5
6
7
8
9
brew install gnu-sed
# 会给出添加 PATH 的提示,按照提示添加到配置文件中(.zshrc)
brew info gnu-sed
# ==> Caveats
# GNU "sed" has been installed as "gsed".
# If you need to use it as "sed", you can add a "gnubin" directory
# to your PATH from your bashrc like:

# PATH="/usr/local/opt/gnu-sed/libexec/gnubin:$PATH"

Getting to know the sed editor

sed 是一个流处理编辑器(stream editor),你可以设定一系列的规则,然后通过这个流编辑器处理他。

sed 可以做如下事情

  1. Reads one data line at a time from the input
  2. Matches that data with the supplied editor commands
  3. Changes data in the stream as specified in the commands
  4. Outputs the new data to STDOUT

按行依次处理文件直到所有内容处理完毕结束,格式 sed options script file

The sed Command Options

Option Description
-e script Adds commands specified in the script to the commands run while processing the input
-f file Adds the commands specified in the file to the commands run while processing the input
-n Doesn’t produce output for each command, but waits for the print command

Defining an editor command int the command line

1
2
echo "This is a test" | sed 's/test/big test/'
# This is a big test

s 表示替换(substitutes), 他会用后一个字符串替换前一个. 下面是替换文件内容的例子. sed 只会在输出内容中做修改,原文件还是保持原样

1
2
3
4
5
6
7
8
9
10
cat data1.txt
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy dog.
sed 's/dog/cat/' data1.txt
# The quick brown fox jumps over the lazy cat.
# The quick brown fox jumps over the lazy cat.
# The quick brown fox jumps over the lazy cat.
# The quick brown fox jumps over the lazy cat.

Using mulitple editor commands int the command line

1
2
3
4
5
sed -e 's/brown/green/; s/dog/cat/' data1.txt 
# The quick green fox jumps over the lazy cat.
# The quick green fox jumps over the lazy cat.
# The quick green fox jumps over the lazy cat.
# The quick green fox jumps over the lazy cat.

如果不想写在一行,可以写在多行

1
2
3
4
5
6
7
8
sed -e '
> s/brown/green/
> s/fox/elephant/
> s/dog/cat/' data1.txt
# The quick green elephant jumps over the lazy cat.
# The quick green elephant jumps over the lazy cat.
# The quick green elephant jumps over the lazy cat.
# The quick green elephant jumps over the lazy cat.

Reading editor commands from a file

如果命令太多,也可以将他们放到文件中

1
2
3
4
5
6
7
8
9
cat script1.sed 
# s/brown/green/
# s/fox/elephant/
# s/dog/cat/
sed -f script1.sed data1.txt
# The quick green elephant jumps over the lazy cat.
# The quick green elephant jumps over the lazy cat.
# The quick green elephant jumps over the lazy cat.
# The quick green elephant jumps over the lazy cat.

为了便于区分 shell 文件和 sed 条件,我们将 sed 条件文件存储为 .sed 结尾

Getting to know the gawk program

sed 让你动态改变文件内容,但是还是有局限性。gawk 提供一个更程序化的方式来处理文本信息. 这个工具包默认是没有的一般需要自己手动安装。gawk 是 GNU 版本的 awk, 通过它你可以

  • Define variables to store data
  • Use arithmetic and string operatiors to perate on data
  • Use structured programming concepts, such as if-then statements and loops, to add logic to your data processing
  • Generate formatted reports by extracting data elements within the data file and repositioning them in another order or format

第四点经常用来批量处理数据使之更具可读性,典型应用就是处理 log 文件。

Visiting the gawk command format

格式 gawk options program file

The gawk Options

Option Description
-F fs Specifies a file separator for delineating data fields in a line
-f file Specifies a file name to read the program from
-v var=value Defines a variable and default value used in the gawk program
-mf N Specifies the maximum number of fields to process in the data file
-mr N Specifies the maximum record size in the data file
-W keyword Specifies the compatibility mode or warning level of gawk

gawk 最大的优势是可以用编程的手段,将数据重新格式化输出

Reading the program script from the command line

格式 gawk '{ commands }' 比如 gawk '{print "Hello World!"}' 这个 demo 命令并没有做什么文字处理,只是简单的接收标准输入然后打印 Hello World. 使用 Ctrl + D 结束对话

gawk 最主要的功能是提供操作文本中数据的功能,默认情况下,gawk 会提取如下变量

  • $0 represents the entire line of text
  • $1 represents the first data field in the line of text
  • $2 represents the second data field in the line of text
  • $n represents the nth data field in the line of text

gawk 会根据命令中提供的分割符做行的分割,下面是 gawk 读取文件并显示第一行的示例

1
2
3
4
5
6
7
8
9
cat data2.txt 
# One line of test text.
# Two lines of test text.
# Three lines of test text.

gawk '{ print $1 }' data2.txt
# One
# Two
# Three

可以用 -F 指定分割符,比如你要处理 /etc/passwd 这个文件

1
2
3
4
5
6
7
8
9
cat /etc/passwd | tail -3
# _coreml:*:280:280:CoreML Services:/var/empty:/usr/bin/false
# _trustd:*:282:282:trustd:/var/empty:/usr/bin/false
# _oahd:*:441:441:OAH Daemon:/var/empty:/usr/bin/false

gawk -F: '{print $1}' /etc/passwd | tail -3
# _coreml
# _trustd
# _oahd

Using multiple commands in the program script

使用分号分割 gawk 中想要运行的多个命令, 下面的命令会替换第四个 field 并打印整行

1
2
echo "My name is Rich" | gawk '{$4="Christine";print $0}'
# My name is Christine

多行表示也是 OK 的

1
2
3
4
5
6
gawk '{
> $4="Christine"
> print $0
> }'
my name is Rich
my name is Christine

Reading the program from a file

和 sed 一样,gawk 也支持从文件读取命令

1
2
3
4
5
6
cat script2.gawk 
# {print $1 "'s hoe directory is " $6}
gawk -F: -f script2.gawk /etc/passwd | tail -3
# _coreml's hoe directory is /var/empty
# _trustd's hoe directory is /var/empty
# _oahd's hoe directory is /var/empty

gawk 文件中包含多个命令的示例

1
2
3
4
5
6
7
8
9
cat script3.gawk 
# {
# text = "'s home directory is "
# print $1 text $6
# }
gawk -F: -f script3.gawk /etc/passwd | tail -3
# _coreml's home directory is /var/empty
# _trustd's home directory is /var/empty
# _oahd's home directory is /var/empty

Running scripts before processing data

gawk 提供了 BEGIN 关键字在处理文本前做一些操作

1
2
gawk 'BEGIN {print "Hello World!"}'
# Hello World!

BEGIN 处理文本的示例

1
2
3
4
5
6
7
8
9
10
cat data3.txt 
# Line 1
# Line 2
# Line 3
gawk 'BEGIN {print "The data3 File Contents: "}
> {print $0}' data3.txt
The data3 File Contents:
# Line 1
# Line 2
# Line 3

Running scripts after processing data

和前面对应的还有一个 after 操作

1
2
3
4
5
6
7
8
gawk 'BEGIN {print "The data3 File Contents:"}
{print $0}
> END {print "End of File"}' data3.txt
# The data3 File Contents:
# Line 1
# Line 2
# Line 3
# End of File

如果过程多了,你还可以将这个步骤写到文件中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
at script4.gawk 
BEGIN {
print "The latest list of users and selles"
print " UserID\t Shell"
print "-------\t------"
FS=":"
}

{
print $1 " \t " $7
}

END {
print "This concludes the listing"
}

gawk -f script4.gawk /etc/passwd
# The latest list of users and selles
# UserID Shell
# ------- ------
# nobody /usr/bin/false
# ...
# _oahd /usr/bin/false
# This concludes the listing

Commanding at the sed Editor Basics

本章简要介绍一下 sed 的常规用法

Introducing more substitution options

Substituting flags
1
2
3
4
5
6
cat data4.txt 
# This is a test of the test script.
# This is the second test of the test script.
sed 's/test/trial/' data4.txt
# This is a trial of the test script.
# This is the second trial of the test script.

默认情况下,sed 只会替换每行中第一个出现的位置,如果想要处理多个位置,需要指定 flags。格式为 s/pattern/replacemnet/flags

four types of substitution flags are available:

  • A number, indicating the pattern occurrence for which new text should be substituted
  • g, indicating that new text should be substituted for all occrurences of the existing text
  • p, indicating that the contents of the original line should be printed
  • w file, which means to write the results of the substitution to a file

替换指定位置的示例,下面示例中只替换了第二个位置的 test

1
2
3
sed 's/test/trial/2' data4.txt 
# This is a test of the trial script.
# This is the second test of the trial script.

全部替换示例

1
2
3
sed 's/test/trial/g' data4.txt 
This is a trial of the trial script.
This is the second trial of the trial script.

打印符合匹配条件的行

1
2
3
4
5
6
cat data5.txt 
# This is a test line.
# This is a different line.

sed -n 's/test/trial/p' data5.txt
# This is a trial line.

-w 指定 sed 结果输出到文件

1
2
3
4
5
sed 's/test/trial/w test.txt' data5.txt 
# This is a trial line.
# This is a different line.
cat test.txt
# This is a trial line.
Replacing characters

Linux 系统中路径符号和 sed 中的符号是重的,也就是说,如果我要用 sed 替换路径的时候就必须用一中很累赘的写法, 比如 sed 's/\/bin\/bash/\/bin\/csh/' /etc/passwd

PS: Mac OS 语法和这个不一样

为了避免这么恶心的写法,我们可以用惊叹号(exclamation point) 代替原来的分割符 sed 's!/bin/bash!/bin/csh!' /etc/passwd

Using addresses

默认情况下 sed 会处理所有的行,如果你只需要处理特殊的几行,你可以使用 line address. line address 有两种模式

  • A numberic range of lines
  • A text pattern that filters out a line

两种模式的格式都是一样的 [address] command 你可以将多个命令组合到一起

1
2
3
4
5
address {
command1
command2
command3
}
Addressing the numberic line

sed 会将 s 之前的内容当作行来处理, 下面的例子只替换第二行的内容

1
2
3
4
5
sed '2s/dog/cat/' data1.txt
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy cat.
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy dog.

替换多行

1
2
3
4
5
sed '2,3s/dog/cat/' data1.txt
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy cat.
# The quick brown fox jumps over the lazy cat.
# The quick brown fox jumps over the lazy dog.

从第 n 行还是到结束

1
2
3
4
5
sed '2,$s/dog/cat/' data1.txt
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy cat.
# The quick brown fox jumps over the lazy cat.
# The quick brown fox jumps over the lazy cat.
Using text pattern filters

使用 pattern 处理特定的行,格式 /pattern/command, 你需要在 pattern 前面指定一个斜杠作为开始

下面的示例中我们只将 root 的 sh 改为 csh

1
2
3
4
5
6
7
8
grep root /etc/passwd
# root:*:0:0:System Administrator:/var/root:/bin/sh
# daemon:*:1:1:System Services:/var/root:/usr/bin/false
# _cvmsroot:*:212:212:CVMS Root:/var/empty:/usr/bin/false
sed '/root/s/sh/csh/' /etc/passwd | grep root
# root:*:0:0:System Administrator:/var/root:/bin/csh
# daemon:*:1:1:System Services:/var/root:/usr/bin/false
# _cvmsroot:*:212:212:CVMS Root:/var/empty:/usr/bin/false

sed 是通过正则表达式来做内容匹配的。

Grouping commands

和 gawk 一样,sed 也可以在一个命令中处理多个匹配

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sed '2{
s/fox/elephant/
s/dog/cat/
}' data1.txt
# The quick brown fox jumps over the lazy dog.
# The quick brown elephant jumps over the lazy cat.
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy dog.

sed '3,${
s/fox/elephant/
s/dog/cat/
}' data1.txt
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy dog.
# The quick brown elephant jumps over the lazy cat.
# The quick brown elephant jumps over the lazy cat.
Deleting lines

d 用来在输出内容中删除某一行, 如果没有指定删选内容,所有输出都会被删除

1
2
3
4
5
6
7
cat data6.txt 
# This is line number 1.
# This is line number 2.
# This is line number 3.
# This is line number 4.
sed 'd' data6.txt

指定删除的行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sed '3d' data6.txt 
# This is line number 1.
# This is line number 2.
# This is line number 4.

sed '2,3d' data6.txt
# This is line number 1.
# This is line number 4.

sed '3,$d' data6.txt
# This is line number 1.
# This is line number 2.

sed '/number 1/d' data6.txt
# This is line number 2.
# This is line number 3.
# This is line number 4.

PS: 这个删除只作用的输出,原文件保持不变

还有一种比较奇葩的删除方式,给两个匹配,删除会从第一个还是,第二个结束,删除内容包括当前行

1
2
'/1/,/3/d' data6.txt 
# This is line number 4.

这里有一个坑,这种方式是匹配删除,当第一匹配时,删除开始,第二个匹配找到时,删除结束。如果文件中有多个地方能匹配到开始,则可能出现意想不到的情况. 比如在 data7 中,第 5 行也能匹配到 1 这个关键字,但是后面就没有 4 了,则会导致后面的内容都删掉. 如果我指定一个不存在的停止符,则所有内容都不显示了

1
2
3
4
5
6
7
8
9
10
11
12
13
cat data7.txt 
# This is line number 1.
# This is line number 2.
# This is line number 3.
# This is line number 4.
# This is line number 1 again.
# This is text you want to keep.
# This is the last line in the file.

sed '/1/,/3/d' data7.txt
# This is line number 4.

sed '/1/,/5/d' data7.txt
Inserting and appending text

sed 也允许你插入,续写内容,但是有一些特别的点

  • The insert command(i) adds a new line before the specified line
  • The append commadn(a) adds a new line after the specified line

特别的点在于,你需要新起一行写这些新加的行, 格式为

1
2
sed '[address]command\
new line'

示例如下, 不过 mac 上貌似有语法错误

1
2
3
4
5
6
7
8
9
10
11
12
echo "Test Line 2" | sed 'i\Test line 1'
# Test line 1
# Test Line 2

echo "Test Line 2" | sed 'a\Test line 1'
# Test Line 2
# Test line 1

echo "Test Line 2" | sed 'i\
> Test Line 1'
# Test Line 1
# Test Line 2

上面演示的是在全部内容之前/后添加新的行,那么怎么在特定行前后做类似的操作呢,你可以用行号指定。但是不能用 range 形式的,因为定义上,i/a 是单行操作

1
2
3
4
5
6
7
8
9
10
11
12
13
sed '3i\
This is an inserted line.' data6.txt
# This is line number1.
# This is line number2.
# This is an inserted line.
# This is line number3.
# This is line number4.
sed '3a\This is an appended line.' data6.txt
# This is line number1.
# This is line number2.
# This is line number3.
# This is an appended line.
# This is line number4.

插入文本末尾

1
2
3
4
5
6
7
sed '$a\
> This is a new line of text.' data6.txt
# This is line number1.
# This is line number2.
# This is line number3.
# This is line number4.
# This is a new line of text.

头部插入多行, 需要使用斜杠分割

1
2
3
4
5
6
7
8
9
sed '1i\
> This is one line of new text.\
> This is another line of new text.' data6.txt
# This is one line of new text.
# This is another line of new text.
# This is line number1.
# This is line number2.
# This is line number3.
# This is line number4.

如果不指定行号,他会每一行都 insert 啊,和之前的理解不一样. append 也是一样的效果

1
2
3
4
5
6
7
8
9
sed 'i\head insert' data6.txt
# head insert
# This is line number1.
# head insert
# This is line number2.
# head insert
# This is line number3.
# head insert
# This is line number4

也能指定 range… 前面的理解果断有问题

1
2
3
4
5
6
7
sed '1,2a\end append' data6.txt
# This is line number1.
# end append
# This is line number2.
# end append
# This is line number3.
# This is line number4.
Changing lines

改变行内容,用法和前面的 i/a 没什么区别

1
2
3
4
5
sed '3c\This is a chagned line of text.' data6.txt
# This is line number1.
# This is line number2.
# This is a chagned line of text.
# This is line number4.

pattern 方式替换

1
2
3
4
5
6
sed '/number3/c\
This is a changed line of text.' data6.txt
# This is line number1.
# This is line number2.
# This is a changed line of text.
# This is line number4.

pattern 替换多行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat data8.txt
# This is line number1.
# This is line number2.
# This is line number3.
# This is line number4.
# This is line number1 again.
# This is yet another line.
# This is the last line in the line.

sed '/number1/c\This is a changed line of text.' data8.txt
# This is a changed line of text.
# This is line number2.
# This is line number3.
# This is line number4.
# This is a changed line of text.
# This is yet another line.
# This is the last line in the line.

指定行号替换的行为方式有点奇怪,他会将你指定的行中内容全部替换掉

1
2
3
4
5
6
7
8
9
10
cat data6.txt
# This is line number1.
# This is line number2.
# This is line number3.
# This is line number4.

sed '2,3c\This is a new line of text.' data6.txt
# This is line number1.
# This is a new line of text.
# This is line number4.
Transforming characters

transform(y) 是唯 sed 支持的唯一一个用于替换单个字符的参数,格式 [address]y/inchars/outchars/. inchars 和 outchars 必须是等长的,不然会报错。他是做一对一替换,比如 y/123/789 他会用 1 代替 7, 2 代替 8 依次类推

1
2
3
4
5
6
7
8
sed 'y/123/789/' data8.txt
# This is line number7.
# This is line number8.
# This is line number9.
# This is line number4.
# This is line number7 again.
# This is yet another line.
# This is the last line in the line.

而且他是全局替换,任何出现的地方都会被换掉

1
2
echo "This 1 is a test of 1 try." | sed 'y/123/456/'
# This 4 is a test of 4 try.
Printing revisited

和 p flag 类似的还有两个符号,表示如下

  • The p command to print a text line
  • The equal sign(=) command to print line numbers
  • The l(lowercase L) command to list a line

p 使用案例, -n 可以强制只打印匹配的内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
echo "this is a test" | sed 'p'
# this is a test
# this is a test

cat data6.txt
# This is line number1.
# This is line number2.
# This is line number3.
# This is line number4.

sed -n '/number3/p' data6.txt
# This is line number3.

sed -n '2,3p' data6.txt
# This is line number2.
# This is line number3.

找到匹配的行,先打印原始值,再替换并打印。

1
2
3
4
5
6
sed -n '/3/{
> p
> s/line/test/p
> }' data6.txt
# This is line number3.
# This is test number3.

equals 相关的案例,输出行号

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat data1.txt 
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy dog.
# The quick brown fox jumps over the lazy dog.
sed '=' data1.txt
# 1
# The quick brown fox jumps over the lazy dog.
# 2
# The quick brown fox jumps over the lazy dog.
# 3
# The quick brown fox jumps over the lazy dog.
# 4
# The quick brown fox jumps over the lazy dog.

搜索匹配的内容并打印行号

1
2
3
4
5
6
sed -n '/number 4/{
> =
> p
> }' data6.txt
# 4
# This is line number 4.

l - listing lines, 打印文字和特殊字符(non-printable characters). 下面的实验中,tab 符号答应失败了,可能什么设置问题把,不过结尾符 $ 倒是么什么问题

1
2
3
4
cat data9.txt 
# This line contains tabs.
sed -n 'l' data9.txt
# This line contains tabs.$

Using files with sed

Writing to a file

通过 w 将匹配的内容写到文件 [address]w filename, 使用 -n 只在屏幕上显示匹配部分

1
2
3
4
5
6
7
8
sed '1,2w test.txt' data6.txt 
# This is line number 1.
# This is line number 2.
# This is line number 3.
# This is line number 4.
cat test.txt
# This is line number 1.
# This is line number 2.

这个技巧在筛选数据的时候格外好用

1
2
3
4
5
6
7
8
9
10
cat data11.txt 
# Blum, R Browncoat
# McGuiness, A Alliance
# Bresnahan, C Browncoat
# Harken, C Alliance

sed -n '/Browncoat/w Browncoats.txt' data11.txt
cat Browncoats.txt
# Blum, R Browncoat
# Bresnahan, C Browncoat
Reading data from a file

The read command(r) allows you to insert data contained in a separate file. format at: [address]r filename

filename 可以是相对路径,也可以是绝对路径。你不能使用 range of address for the read command. you can only specify a single line number or text pattern address.

读取目标文件中的内容并插入到指定位置的后面

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat data12.txt 
# This is an added line.
# This is the second added line.

cat data6.txt
# This is line number 1.
# This is line number 2.
# This is line number 3.
# This is line number 4.

sed '3r data12.txt' data6.txt
# This is line number 1.
# This is line number 2.
# This is line number 3.
# This is an added line.
# This is the second added line.
# This is line number 4.

pattern 同样支持

1
2
3
4
5
6
7
sed '/number 2/r data12.txt' data6.txt 
# This is line number 1.
# This is line number 2.
# This is an added line.
# This is the second added line.
# This is line number 3.
# This is line number 4.

添加到末尾

1
2
3
4
5
6
7
sed '$r data12.txt' data6.txt 
# This is line number 1.
# This is line number 2.
# This is line number 3.
# This is line number 4.
# This is an added line.
# This is the second added line.

将 read 和 delete 结合使用,我们就可以有类似于替换的效果了

下面例子中我们将名单用 LIST 这个单词做为占位符,将 data11.txt 中的内容替换进去

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat notice.std 
# Would the following people:
# LIST
# please report to the ship's captain.

sed '/LIST/{
> r data11.txt
> d
> }' notice.std
# Would the following people:
# Blum, R Browncoat
# McGuiness, A Alliance
# Bresnahan, C Browncoat
# Harken, C Alliance
# please report to the ship's captain.

Chapter 20: Regular Expressions

What Are Regular Expressions

A definition

A regular expression is a pattern template you define that a Linux utility users to filter text.

Types of regular expressions

Linux 系统中,一些不同的应用采用不同的正则表达式。正则表达式通过 regular expression engine 实现。Linux 世界中有两个爆款 engines:

  • The POSIX Basic Regular Expression(BRE) engine
  • The POSIX Extened Regular Expression(ERE) engine

大多数 Linux 工具都会适配 RRE,sed 除外,它的目标是尽可能快的处理,所以只识别部分 BRE。

Defining BRE Patterns

Plain text

搜索的关键字是目标的一部分即可

Special characters

正则可以识别的特殊字符 .*[]^${}\+?|(), 如果你想要使用文本版的这些特殊字符,在他们前面加 backslash character()

1
2
echo 'The cost is $4.00' | sed -n '/\$/p'
# The cost is $4.00

Anchor characters

Starting at the beginning

caret character(^) 指代文本开头

1
2
3
4
echo "The book store" | sed -n '/^book/p'
# no matched
echo "Books are great" | sed -n '/^Book/p'
# Books are great

当符号出现在非开头位置,则他会被当作普通字符处理

1
2
echo "This ^ is a test" | sed -n '/s ^/p' 
# This ^ is a test
Looking for the ending

The dollar sign($) 指代了文本的结尾

1
2
echo "This is a good book" | sed -n '/book$/p'
# This is a good book

The dot character

dot 用于指代除换行外的任何字符, 如果 . 代表的位置没有东西,则匹配失败

1
2
3
4
5
6
7
8
9
10
cat data6 
# This is a test of a line.
# The cat is sleeping.
# That is a very nice hat.
# This test is at line four.
# at ten o'clock we'll go home.
sed -n '/.at/p' data6
# The cat is sleeping.
# That is a very nice hat.
# This test is at line four.

Character classes

character class 用于限定匹配的内容,使用 square brackets([]) 表示

1
2
3
sed -n '/[ch]at/p' data6
# The cat is sleeping.
# That is a very nice hat.

Negating character classes

和前面的相反,是不包含的意思

1
2
sed -n '/[^ch]at/p' data6
# This test is at line four.

Using ranges

sed -n '/^[0-9][0-9][0-9][0-9][0-9]$/p' data8 这个技巧也使用于字符

1
2
3
sed -n '/[c-h]at/p' data6
# The cat is sleeping.
# That is a very nice hat.

也可以指定非连续的字符集合

1
2
3
4
5
sed -n '/[a-ch-m]at/p' data6
# The cat is sleeping.
# That is a very nice hat.
echo "I'm getting too fat" | sed -n '/[a-ch-m]at/p'
# no matched

Special character classes

BRE Special Character classes

Class Description
[[:alpha:]] Matches any alphabetical character, either upper or lower case
[[:alnum:]] Matches any alphanumberic character 0-9, A-Z or a-z
[[:blank:]] Matches a space or Tab character
[[:digit:]] Matches a numberical digit from 0-9
[[:lower:]] Matches any lowercase alphabetical character a-z
[[:print:]] Matches any printable character
[[:punct:]] Matches a punctuation character
[[:space:]] Matches any whitespace character: space, Table, NL, FF, VT CR
[[:upper:]] Matches any uppercase alphabetical character A-Z
1
2
3
4
5
6
7
8
9
10
echo "abc" | sed -n '/[[:digit:]]/p'
# no matched
echo "abc" | sed -n '/[[:alpha:]]/p'
# abc
echo "abc123" | sed -n '/[[:digit:]]/p'
# abc123
echo "This is, a test" | sed -n '/[[:punct:]]/p'
# This is, a test
echo "This is a test" | sed -n '/[[:punct:]]/p'
# no matched

The asterisk

星号标识字符出现一次或 n 次

1
2
3
4
5
6
echo "ik" | sed -n '/ie*k/p'
# ik
echo "iek" | sed -n '/ie*k/p'
# iek
echo "ieeeek" | sed -n '/ie*k/p'
# ieeeek

星号也可以和 character class 结合使用

1
2
echo "baeeeet" | sed -n '/b[ae]*t/p'
# baeeeet

Extended Regular Expressions

gawk 识别 ERE pattern,ERE 新增了一些符号来扩展功能

Caution sed 和 gawk 采用了不同的引擎,gawk 可以适配大部分的扩展功能。sed 不能,但是 sed 更快

The question mark

问号(?)表示出现 0 次或 1 次。

1
2
3
4
5
6
"bt" | gawk '/be?t/{print $0}'
# bt
echo "bet" | gawk '/be?t/{print $0}'
# bet
echo "beet" | gawk '/be?t/{print $0}'
# no matched

问号也可以结合 character class 使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
echo "bt" | gawk '/b[ae]?t/{print $0}'
# bt
echo "bat" | gawk '/b[ae]?t/{print $0}'
# bat
echo "bot" | gawk '/b[ae]?t/{print $0}'
# no matched
echo "bet" | gawk '/b[ae]?t/{print $0}'
# bet
echo "beaet" | gawk '/b[ae]?t/{print $0}'
# no matched
echo "baet" | gawk '/b[ae]?t/{print $0}'
# no matched
echo "beat" | gawk '/b[ae]?t/{print $0}'
# no matched
echo "beet" | gawk '/b[ae]?t/{print $0}'
# no matched

The plus sign

加号(+), 出现一次或多次

1
2
3
4
5
6
echo "beet" | gawk '/be+t/{print $0}'
# beet
echo "bet" | gawk '/be+t/{print $0}'
# bet
echo "bt" | gawk '/be+t/{print $0}'
# no matched

结合方括号使用

1
2
3
4
5
6
7
8
9
10
11
12
echo "bt" | gawk '/b[ae]+t/{print $0}'
# no matched
echo "bat" | gawk '/b[ae]+t/{print $0}'
# bat
echo "bet" | gawk '/b[ae]+t/{print $0}'
# bet
echo "baet" | gawk '/b[ae]+t/{print $0}'
# baet
echo "beet" | gawk '/b[ae]+t/{print $0}'
# beet
echo "beeet" | gawk '/b[ae]+t/{print $0}'
# beeet

Using braces

花括号表示重复多次

  • m: The regular expression appears exactly m times
  • m,n: The regular expression appears at least m times, but no more than n times

Cautim gawk 默认不识别这个模式,需要加上 –re-interval 参数增加这个功能

1
2
3
4
5
6
echo "bt" | gawk --re-interval '/be{1}t/{print $0}'
# no matched
echo "bet" | gawk --re-interval '/be{1}t/{print $0}'
# bet
echo "beet" | gawk --re-interval '/be{1}t/{print $0}'
# no matched

指定出现次数的区间

1
2
3
4
5
6
7
8
echo "bt" | gawk --re-interval '/be{1,2}t/{print $0}'
# no matched
echo "bet" | gawk --re-interval '/be{1,2}t/{print $0}'
# bet
echo "beet" | gawk --re-interval '/be{1,2}t/{print $0}'
# beet
echo "beeet" | gawk --re-interval '/be{1,2}t/{print $0}'
# no matched

同样适用于 character class

1
2
3
4
5
6
7
8
9
10
11
12
echo "bt" | gawk --re-interval '/b[ae]{1,2}t/{print $0}'
# no matched
echo "bat" | gawk --re-interval '/b[ae]{1,2}t/{print $0}'
# bat
echo "bet" | gawk --re-interval '/b[ae]{1,2}t/{print $0}'
# bet
echo "beat" | gawk --re-interval '/b[ae]{1,2}t/{print $0}'
# beat
echo "beet" | gawk --re-interval '/b[ae]{1,2}t/{print $0}'
# beet
echo "beeet" | gawk --re-interval '/b[ae]{1,2}t/{print $0}'
# no matched

The pipe symbol

pipe symbol 可以让你实现 OR 的逻辑,只要有一个匹配,就算 match 了 expr1 | expr2 | ...

1
2
3
4
5
6
echo "The cat is asleep" | gawk '/cat|dog/{print $0}'
# The cat is asleep
echo "The dog is asleep" | gawk '/cat|dog/{print $0}'
# The dog is asleep
echo "The sheep is asleep" | gawk '/cat|dog/{print $0}'
# no matched

结合 character class 使用

1
2
echo "He has a hat" | gawk '/[ch]at|dog/{print $0}'
# He has a hat

Grouping expressions

可以使用括号(parentheses)表示 group. 这个 group 会被当成一个基本字符对待。

1
2
3
4
echo "Sat" | gawk '/Sat(urday)?/{print $0}'
# Sat
echo "Saturday" | gawk '/Sat(urday)?/{print $0}'
# Saturday

group 经常和 pipe 结合使用来表示可能出现的组合

1
2
3
4
5
6
echo "cat" | gawk '/(c|b)a(b|t)/{print $0}'
# cat
echo "cab" | gawk '/(c|b)a(b|t)/{print $0}'
# cab
echo "bat" | gawk '/(c|b)a(b|t)/{print $0}'
# bat

Regular Expressions in Action

实操,介绍一些使用案例

Counting directory files

统计环境变量中的可执行文件数量

步骤:echo $PATH 拿到路径,通过 : 分割,最后通过 ls 列出文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
cat countfiles.sh                                     
#!/usr/local/bin/bash
# Count number of files in you PATH

mypath=$(echo $PATH | sed 's/:/ /g')
count=0
for directory in $mypath
do
check=$(ls $direcotry)
for item in $check
do
count=$[ $count + 1 ]
done
echo "$directory - $count"
count=0
done

./countfiles.sh
# /usr/local/opt/mysql@5.7/bin - 4
# /Users/i306454/SAPDevelop/tools/maven/bin - 4
# ....

Validating a phone number

写一个脚本匹配电话号码,样本

1
2
3
4
5
6
7
8
000-000-0000
123-456-7890
212-555-1234
(317)555-1234
(202) 555-9876
33523
1234567890
234.123.4567

规则:如下四种格式是合法的,其他都不合法

1
2
3
4
(123)456-7890
(123) 456-7890
123-456-7890
123.456.7890

找规律:

  • 可能以括号开头 ^\(?
  • 接下来三位数是 area codes, 第一位是非 0,1的数,后两位是 0-9的数 [2-9][0-9]{2}
  • 可能存在的结束括号 \)?
  • 间隔符,可以没有,可以是空格,点,横线 (| |-|\.) 使用 group 把它看作一个集合,使用竖线表示 or
  • 三个 0-9 的整数 [0-9]{3}
  • 空格(虽然例子上没显示),横线或者点号 ( |-|\.)
  • 最后接四位整数作为结尾 [0-9]{4}$

完整表达式 ^\(?[2-9][0-9]{2}\)?(| |-|\.)[0-9]{3}( |-|\.)[0-9]{4}$

测试:

1
2
3
4
5
6
7
8
9
10
11
cat isphone              
#!/usr/local/bin/bash
# Script to filter out bad phone numbers

gawk --re-interval '/^\(?[2-9][0-9]{2}\)?(| |-|\.)[0-9]{3}( |-|\.)[0-9]{4}$/{print $0}'

cat phonelist | ./isphone
# 212-555-1234
# (317)555-1234
# (202) 555-9876
# 234.123.4567

PS: 中间那个过来间隔符的操作我之前是没有意识到的

Parsing an e-mail address

验证 email 的正则表达式

  • 用户名部分,可以是任何数字,字母,下划线,横杠和加号 ^([a-zA-Z0-9_\-\.\+]+)@
  • hostname 和名字部分一样的规则 ([a-zA-Z0-9_\-\.\+]+)
  • 顶级域名值只能是字母,大于2个字符,小于5个字符 \.([a-zA-Z]{2,5})$

完整表达式 ^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.\+]+)\.([a-zA-Z]{2,5})$

测试

1
2
3
4
5
6
7
8
9
10
11
12
cat isemail 
#!/usr/local/bin/bash
# Script to filter out bad email

gawk --re-interval '/^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.\+]+)\.([a-zA-Z]{2,5})$/{print $0}'

echo "rich@here.now" | ./isemail
# rich@here.now
echo "rich@here.now." | ./isemail
# no match
echo "rich.blum@here.now" | ./isemail
# rich.blum@here.now

Advanced sed

Looking at Multiline Commands

在前面的 sed 使用过程中,你可能已经察觉到了 sed 的一个限制,他只能按行处理。当 sed 拿到一个字符流时,他会将数据按照 newline characters 做分割,每次处理一行。

但是实际工作你总会遇到需要处理多行的情况,比如你要替换文件中的 Linux System Administrators Group 关键字,但是他可能分布在两行中,这时如果你安之前的 sed 做替换就会漏掉一些内容

为了应对这种情况,sed 提供了三个关键字来处理这种情况

  • N add the next line in the data stream to create a multiline group for processing
  • D delete a single line in multiline group
  • P prints a single line in a multiline group
Using the single-line next command

下面的示例中,我们有5行文本,1,3,5有值,2,4为空。目标是通过 sed 只移除第二行

1
2
3
4
5
6
cat data1.txt
# This is the header line.

# This is a data line.

# This is the last line.

错误示范,会删掉所有空行

1
2
3
4
sed '/^$/d' data1.txt       
# This is the header line.
# This is a data line.
# This is the last line.

通过使用 n 这个关键字,可以将下一行也包括到搜索范围内

1
2
3
4
5
sed '/header/{n ; d}' data1.txt
# This is the header line.
# This is a data line.

# This is the last line

PS: MacOS 不支持,在 Ubantu 上做的实验

简单一句话就是,n 不会和前一句做合并处理. 说实话上面的例子还是不怎么理解,可能得另外找点书补充一下

Combining lines of text

The single-line next command moves the next line of text from the data stream into the processing space(called the pattern space) of the sed editor.

The multiline version of the next command(which uses a captial N) adds the next line of text to the text already in the pattern space.

大写的 N 可以将两行拼成一行处理,中间用换行符隔开

1
2
3
4
5
6
7
8
9
10
cat data2.txt
# This is the header line.
# This is the first data line.
# This is the second data line
# This is the last line

sed '/first/{N; s/\n/ /}' data2.txt
# This is the header line.
# This is the first data line. This is the second data line
# This is the last line

上面的例子中,我们找到包含 first 的行,然后将下一行接上一起处理,处理的时候,将换行替换为空格

再举一个需要测试的数据落在两个段落中的例子

1
2
3
4
5
cat data3.txt
# On Tuesday, the Linux System
# Administrator's group meeting will be held.
# All System Administrators should attend.
# Thank you for your attendance.

第一个关键字替换失败,第二个成功,因为第一个用的换行,匹配用的空格

1
2
3
4
5
sed 'N;s/System Administrator/Desktop User/' data3.txt
# On Tuesday, the Linux System
# Administrator's group meeting will be held.
# All Desktop Users should attend.
# Thank you for your attendance

替换成功不过换行消失了

1
2
3
4
sed 'N;s/System.Administrator/Desktop User/' data3.txt
# On Tuesday, the Linux Desktop User's group meeting will be held.
# All Desktop Users should attend.
# Thank you for your attendance.

使用两个替换分别应对换行和空格的情况

1
2
3
4
5
6
7
8
sed 'N
> s/System\nAdministrator/Desktop\nUser/
> s/System Administrator/Desktop User/
> ' data3.txt
# On Tuesday, the Linux Desktop
# User's group meeting will be held.
# All Desktop Users should attend.
# Thank you for your attendance.

这里还有一个小问题,由于命令是 N 开头,他会先拿下一行到 pattern space,当处理最后一行时,下一行为空,直接结束了,如果要替换的目标在最后一行就会有遗漏

1
2
3
4
5
6
7
8
9
10
11
12
cat data4.txt
# On Tuesday, the Linux System
# Administrator's group meeting will be held.
# All System Administrators should attend.

sed 'N
s/System\nAdministrator/Desktop\nUser/
s/System Administrator/Desktop User/
' data4.txt
# On Tuesday, the Linux Desktop
# User's group meeting will be held.
# All System Administrators should attend.

这时你可以换一下顺序

1
2
3
4
5
6
7
8
sed '
> s/System Administrator/Desktop User/
> N
> s/System\nAdministrator/Desktop\nUser/
> ' data4.txt
# On Tuesday, the Linux Desktop
# User's group meeting will be held.
# All Desktop Users should attend.

(; ̄ェ ̄)简直无情,太繁琐了

当使用 N 的方式做 delete 的时候,它会将匹配到的两行内容全部删掉

1
2
3
4
5
6
7
cat data4.txt
# On Tuesday, the Linux System
# Administrator's group meeting will be held.
# All System Administrators should attend.

sed 'N ; /System\nAdministrator/d' data4.txt
# All System Administrators should attend.

sed 提供了一个只删除第一行内容的 flag - D

1
2
3
sed 'N ; /System\nAdministrator/D' data4.txt
# Administrator's group meeting will be held.
# All System Administrators should attend.

类似的技巧可以用来删除文章开头的空行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat -n data1.txt
# 1
# 2 This is the header line.
# 3
# 4 This is a data line.
# 5
# 6 This is the last line

sed '/^$/{N;/header/D}' data1.txt | cat -n
# 1 This is the header line.
# 2
# 3 This is a data line.
# 4
# 5 This is the last line

和 p 对应的还有一个 P, 用法和上面的 D 一样,如果用 p 会打印两行,而用 P 则只打印第一行

1
2
3
4
5
sed -n 'N ; /System\nAdministrator/P' data3.txt
# On Tuesday, the Linux System
sed -n 'N ; /System\nAdministrator/p' data3.txt
# On Tuesday, the Linux System
# Administrator's group meeting will be held.

Holding Space

pattern space 是 sed 用于存放正在的处理文本的空间。但是这并不是存放文本的唯一的地方,还有一个叫做 hold space. 下列是五个可以操作 hold space 的命令

The sed Editor Hold Space Commands

Command Description
h Copies pattern space to hold space
H Appends pattern space to hold space
g Copies hold space to pattern sapce
G Appends hold space to pattern space
x Exchanges contents of pattern and hold spaces

这些命令可以让 pattern space 空出来处理其他文本。一般来说,你在通过 h/H 将 pattern space 的内容移动到 hold space 之后,都会再通过 g/G/x 将内容在放回到 pattern space 中。

1
2
3
4
5
6
7
8
9
10
cat data2.txt
# This is the header line.
# This is the first data line.
# This is the second data line
# This is the last line

sed -n '/first/ {h ; p ; n ; p ; g ; p }' data2.txt
# This is the first data line.
# This is the second data line
# This is the first data line.

解析上面的命令

  1. sed 通过 RE 过滤包含 first 的语句
  2. 匹配到目标语句后,开始执行 {} 中的内容,h 会将语句 copy 到 hold space 中
  3. 第一个 p 打印当前 pattern space 中内容
  4. n 提取下一行内容并放到 pattern space
  5. 第二个 p 打印当前 pattern space 中内容, 即包含 second 的语句
  6. g 将 hold space 中的内容再 copy 回去
  7. 第三个 p 打印当前 pattern space 中内容, 即包含 first 的语句

Negating a Command

使用叹号(!)对操作取反

1
2
3
4
5
6
7
8
9
10
11
cat data2.txt
# This is the header line.
# This is the first data line.
# This is the second data line
# This is the last line
sed -n '/header/p' data2.txt
# This is the header line.
sed -n '/header/!p' data2.txt
# This is the first data line.
# This is the second data line
# This is the last line

N 也有取反操作, 之前的例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
sed 'N
s/System\nAdministrator/Desktop\nUser/
s/System Administrator/Desktop User/
' data4.txt
# On Tuesday, the Linux Desktop
# User's group meeting will be held.
# All System Administrators should attend.

sed '$!N
> s/System\nAdministrator/Desktop\nUser/
> s/System Administrator/Desktop User/
> ' data4.txt
# On Tuesday, the Linux Desktop
# User's group meeting will be held.
# All Desktop Users should attend.

$!N 消失 最后一行不执行 N 的操作。。。

通过上面介绍的这些技巧,你可以利用 hold space 做文本倒序的功能

  1. Place a line in the pattern space
  2. Place the line from the pattern space to the hold space
  3. Put the next line of text in the pattern space
  4. Append the hold space to the pattern space
  5. Place everything in the pttern space into the hold space
  6. Repeat step 3-5 until you’ve put all the lines in reverse oder in the hold space
  7. Retrieve the lines, and print them
1
2
3
4
5
6
7
8
9
10
11
cat -n data2.txt
# 1 This is the header line.
# 2 This is the first data line.
# 3 This is the second data line
# 4 This is the last line

sed -n '{1!G; h; $p}' data2.txt | cat -n
# 1 This is the last line
# 2 This is the second data line
# 3 This is the first data line.
# 4 This is the header line.
  • 1!G 第一行时不用将 hold space 的内容 append 过来, 不加的话会多一个空行
  • h copy to hold space
  • $p 最后一行的话 打印

这尼玛也太精巧了把,我感觉我想不出来 (; ̄ェ ̄)

PS: 如果真要倒序,直接用 tac 即可, cat 的倒写

Changing the Flow

默认情况下 sed 是从头到尾的处理的,但是他也提供了方法改变处理顺序,感觉像有点像结构化语言

Branching

效果和叹号一样,只不过他是会根据 address 的标识批量操作而已

branch command: [address]b [label]

下面的例子中, sed 在做替换是根据 2,3b 跳过了第 2-3 行

1
2
3
4
5
6
7
8
9
10
11
cat data2.txt
# This is the header line.
# This is the first data line.
# This is the second data line.
# This is the last line.

sed '{2,3b; s/This is/Is this/; s/line./test?/}' data2.txt
# Is this the header test?
# This is the first data line.
# This is the second data line.
# Is this the last test?

label 的作用是设置一个跳点,本来看了第一个例子我还想说它很像 if condition 但是感觉上说他是 goto 还更恰当一点。 label 最长为 7 个字符

下面的例子,jump1 更像是 if, 如果 match 则跳过条件直接执行 :jump1 之后的命令

1
2
3
4
5
6
7
8
9
10
11
12
13
cat data2.txt
# This is the header line.
# This is the first data line.
# This is the second data line.
# This is the last line.

sed '{/first/b jump1; s/This is the/No jump on/
> :jump1
> s/This is the/Jump here on/}' data2.txt
# No jump on header line.
# Jump here on first data line.
# No jump on second data line.
# No jump on last line.

当 b 匹配的内容出现,则跳过第一个替换,直接执行后一个。更骚的操作是下面的这个循环替换逗号的操作

1
2
3
4
5
6
7
8
9
10
11
12
echo "This, is, a, test, to, remove, commas." | sed -n '{
> :start
> s/,//1p
> b start
> }'
# This is, a, test, to, remove, commas.
# This is a, test, to, remove, commas.
# This is a test, to, remove, commas.
# This is a test to, remove, commas.
# This is a test to remove, commas.
# This is a test to remove commas.
# ^C

这个例子大致意思我是懂得,但是不清楚为什么执行操作的时候文本一直有效,不会被冲掉吗?可能要深入了解一下 pattern space 才能直到原因。这个cmd 需要 Ctrl + C 才能强制结束, 下面是改进版本

1
2
3
4
5
6
7
8
9
10
11
echo "This, is, a, test, to, remove, commas." | sed -n '{
:start
s/,//1p
/,/b start
}'
# This is, a, test, to, remove, commas.
# This is a, test, to, remove, commas.
# This is a test, to, remove, commas.
# This is a test to, remove, commas.
# This is a test to remove, commas.
# This is a test to remove commas.

Testing

语法和 branch 很像 [address]t [label]

test command provide a cheap way to perform a basic if-then statement on the text in the data stream

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat data2.txt
# This is the header line.
# This is the first data line.
# This is the second data line.
# This is the last line.
sed '{
> s/first/matched/
> t
> s/This is the/No match on/
> }' data2.txt
# No match on header line.
# This is the matched data line.
# No match on second data line.
# No match on last line.

如果 t 前面的 cmd 匹配则执行,不然直接执行后一个命令。之前循环替换逗号的例子用 t 的形式

1
2
3
4
5
6
7
8
9
10
11
echo "This, is, a, test, to, remove, commas." | sed -n '{
:start
s/,//1p
t start
}'
# This is, a, test, to, remove, commas.
# This is a, test, to, remove, commas.
# This is a test, to, remove, commas.
# This is a test to, remove, commas.
# This is a test to remove, commas.
# This is a test to remove commas.

Replacing via a Pattern

通过 sed 做精确替换还是简单的, 比如下面的例子要在 cat 外面添加双引号

1
2
echo "The cat sleeps in his hat." | sed 's/cat/"cat"/'
# The "cat" sleeps in his hat.

但是如果你想要在所有 .at 外面加双引号可能有会遇到问题了

1
2
echo "The cat sleeps in his hat." | sed 's/.at/".at"/g'
# The ".at" sleeps in his ".at".

Using the ampersand

为了解决上面的问题,sed 提供了 & 符号指代匹配的字符

1
2
echo "The cat sleeps in his hat." | sed 's/.at/"&"/g'
# The "cat" sleeps in his "hat".

Replacing individual words

如果你只想替换一部分内容,说人话就是支持 group 的模式减少 typing

1
2
3
echo "This System Administractor manual" | sed '
s/\(System\) Administractor/\1 User/'
# This System User manual
  • group 需要用反斜线
  • 指代 group 用反斜线加数子

下面的例子中我们用原句中的一部分代替原有部分

1
2
echo "That furry cat is pretty" | sed 's/furry \(.at\)/\1/'
# That cat is pretty

这个技巧在插入值的时候很好用

1
2
3
4
5
6
echo "1234567" | sed '{                                    
:start
s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/
t start
}'
# 1,234,567

有两个分组

  • .*[0-9]
  • [0-9]{3}

第一次替换结果为 1234,567,第二次 1,234,567

Placing sed Commands in Script

展示一些脚本中使用 sed 的技巧

Using wrappers

每次使用 sed 的时候现打会很累赘,你可以将他们写到脚本中并调用

下面的例子中,我们将之前实现的 reverse 功能通过脚本调用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat reverse.sh
#!/bin/bash
# Shell wrapper for sed editor script.
# to reverse text file lines
sed -n '{1!G; h; $p}' $1

cat data2.txt
# This is the header line.
# This is the first data line.
# This is the second data line.
# This is the last line.

./reverse.sh data2.txt
# This is the last line.
# This is the second data line.
# This is the first data line.
# This is the header line.

Redirecting sed output

sed 操作后的输出可以用 $() 包裹起来作为结果引用

下面的例子中我们计算斐波那契额数列并用之前写的 sed 表达式为它加上逗号分割

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
cat fact.sh                        
#!/usr/local/bin/bash
# Add commas to number in factorial answer

factorial=1
counter=1
number=$1
#
while [ $counter -le $number ]
do
factorial=$[ $factorial * $counter ]
counter=$[ $counter + 1 ]
done
#
result=$(echo $factorial | sed '{
:start
s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/
t start
}')
#
echo "The result is $result"
#

./fact.sh 20
# The result is 2,432,902,008,176,640,000

Creating sed Utilities

分享一些数据处理函数

Spacing with double lines

为文本中的每一行后面新家一个空行, 如果最后一行不想加空格,可以用叹号取反

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sed 'G' data2.txt  | cat -n
# 1 This is the header line.
# 2
# 3 This is the first data line.
# 4
# 5 This is the second data line.
# 6
# 7 This is the last line.
# 8
sed '$!G' data2.txt | cat -n
# 1 This is the header line.
# 2
# 3 This is the first data line.
# 4
# 5 This is the second data line.
# 6
# 7 This is the last line.

Spacing files that may have blanks

如果你的文件中已经存在空行了,那么用上面的技巧,你的文件中可能出现多个空行,怎么保证空行数只有一个呢

1
2
3
4
5
6
7
8
9
10
sed '$!G' data6.txt | cat -n
# 1 This is line number1.
# 2
# 3 This is line number2.
# 4
# 5
# 6
# 7 This is line number3.
# 8
# 9 This is line number4.

解决方案,现将所有空格去了,再做加空行的操作

1
2
3
4
5
6
7
8
sed '/^$/d; $!G;' data6.txt  | cat  -A
# This is line number 1.$
# $
# This is line number 2.$
# $
# This is line number 3.$
# $
# This is line number 4.$

PS: 使用 i\ 的语法添加空行会带有一个空格,推荐使用 1G 的方式

Numbering lines in a file

19 章时我们介绍过用 = 显示行号的操作

1
2
3
4
5
6
7
8
9
sed '=' data2.txt 
1
This is the header line.
2
This is the first data line.
3
This is the second data line.
4
This is the last line.

这种格式有点奇怪,更友好的方式应该时行号和字串在一行中,这里可以用 N

1
2
3
4
5
sed '=' data2.txt | sed 'N; s/\n/ /'
# 1 This is the header line.
# 2 This is the first data line.
# 3 This is the second data line.
# 4 This is the last line.

这中方式最大的好处是,没有加行额外的空格,一些其他的工具,比如 nl, cat -n 会在结果前面添加一些空格

1
2
3
4
5
6
7
8
9
10
11
nl data2.txt
# 1 This is the header line.
# 2 This is the first data line.
# 3 This is the second data line.
# 4 This is the last line.

cat -n data2.txt
# 1 This is the header line.
# 2 This is the first data line.
# 3 This is the second data line.
# 4 This is the last line.

Printing last lines

只打印最后一行

1
2
sed -n '$p' data2.txt
# This is the last line.

使用类似的技巧,你可以显示末尾几行数据,这种做法叫做 rolling window

rolling window 中我们结合使用 N,将整块的文本存储到 pattern space 中

下面的例子中我们将用 sed 显示文本最后10行的内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
cat data7.txt
# This is line 1.
# This is line 2.
# This is line 3.
# This is line 4.
# This is line 5.
# This is line 6.
# This is line 7.
# This is line 8.
# This is line 9.
# This is line 10.
# This is line 11.
# This is line 12.
# This is line 13.
# This is line 14.
# This is line 15.

sed '{
:start
$q; N; 11,$D
b start
}' data7.txt
# This is line 6.
# This is line 7.
# This is line 8.
# This is line 9.
# This is line 10.
# This is line 11.
# This is line 12.
# This is line 13.
# This is line 14.
# This is line 15.
  • $q 退出
  • N 将下一行 append 到 pattern space
  • 11,$D 如果当前为 10 行以后,删除第一行

Deleting lines

这节将介绍一些快速移除空白行的操作

Deleting consecutive blank lines

删除多余空行,这里用了另外一种解决方案。这里的规则是,起始于任何非空行,终止于空行的内容都不会被删除

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
cat -n data8.txt   
# 1 This is the header line.
# 2
# 3
# 4 This is the first data line.
# 5
# 6 This is the second data line.
# 7
# 8
# 9 This is the last line.
# 10

sed '/./,/^$/!d' data8.txt | cat -n
# 1 This is the header line.
# 2
# 3 This is the first data line.
# 4
# 5 This is the second data line.
# 6
# 7 This is the last line.
# 8
Deleting leading blank lines

删除开头部分的空行

1
2
3
4
5
6
7
8
9
10
11
cat -n data9.txt 
# 1
# 2
# 3 This is line one.
# 4
# 5 This is line two.

sed '/./,$!d' data9.txt | cat -n
# 1 This is line one.
# 2
# 3 This is line two.

任何有内容的行开始,到结束不删除

Deleting trailing blank lines

删除结尾部分的空行要比删除开头部分的空行麻烦一点,需要一些技巧和循环

1
2
3
4
5
6
7
8
9
10
11
12
13
cat -n data10.txt
# 1 This is the first line.
# 2 This is the second line.
# 3
# 4
# 5

sed '{
:start
/^\n*$/{$d; N; b start}
}' data10.txt | cat -n
# 1 This is the first line.
# 2 This is the second line.

匹配任何只包含换行的 line, 如果是最后一行,则删除,如果不是就再次执行

Removing HTML tags

1
2
3
4
5
6
7
8
9
10
11
12
13
cat data11.txt
# <html>
# <head>
# <title>This is the page title</title>
# </head>
# <body>
# <p>
# This is the <b>first</b> line in the Web page.
# This should provide some <i>useful</i>
# information to use in our sed script.
# </p>
# </body>
# </html>

如果直接使用 s/<.*>//g 会出问题,一些文本类似 <b>abc</b> 也会被删除

1
2
3
4
5
6
7
8
9
10
11
12
13
sed 's/<.*>//g' data11.txt | cat -n 
# 1
# 2
# 3
# 4
# 5
# 6
# 7 This is the line in the Web page.
# 8 This should provide some
# 9 information to use in our sed script.
# 10
# 11
# 12

这个是由于 sed 将内嵌的 > 识别为 .* 的一部分了,可以使用 s/<[^>]*>//g 修复, 再结合删除空格的语法

1
2
3
4
5
sed 's/<[^>]*>//g; /^$/d' data11.txt             
# This is the page title
# This is the first line in the Web page.
# This should provide some useful
# information to use in our sed script.

Chapter 22: Advanced gawk

Using Variables

gawk 支持两种不同类型的变量

  • Built-in variables
  • User-defined variables

Built-in variables

这一节将展示 gawk 自带变量的使用办法

The field and record separator variables

The gawk Data Field and REcord Variables

Variable Description
FIELDWIDTHS A space-separated list of numbers defining the exact width(in spaces) of each data field
FS Input field separator character
RS Input record separator character
OFS Output field separator character
ORS Output record separator character

下面的例子展示了 FS 的使用方法, 通过 FS 指定分割符,只输出每行前三组数据

1
2
3
4
5
6
7
8
9
cat data1
# data11,data12,data13,data14,data15
# data21,data22,data23,data24,data25
# data31,data32,data33,data34,data35

gawk 'BEGIN{FS=","} {print $1, $2, $3}' data1
# data11 data12 data13
# data21 data22 data23
# data31 data32 data33

OFS 指定输出时候的分割符

1
2
3
4
gawk 'BEGIN{FS=","; OFS="--"} {print $1, $2, $3}' data1
# data11--data12--data13
# data21--data22--data23
# data31--data32--data33

有些数据并不是用固定的分割符做刷剧划分的,而是用的位置,这个时候你就要用到 FIELDWIDTHS 了

1
2
3
4
5
6
7
8
9
cat data1b                                                               
# 1005.3247596.37
# 115-2.349194.00
# 05810.1298100.1

gawk 'BEGIN{FIELDWIDTHS="3 5 2 5"} {print $1, $2, $3 $4}' data1b
# 100 5.324 7596.37
# 115 -2.34 9194.00
# 058 10.12 98100.1

PS: FIEDLWIDTHS 必须是常数,变量是不支持的

RS/ORS 用于行数据,默认的 RS 即为换行符

下面是一个解析电话号码的例子,我们想要解析出用户和对应的电话号码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat data2    
# Riley Mullen
# 123 Main Street
# Chicago, IL 60601
# (312)555-1234

# Frank Williams
# 456 Oak Street
# Indianapolis, IN 46201
# (317)555-9876

# Haley Snell
# 4231 Elm Street
# Detroit, MI 48201
# (313)555-4938

如果我们还是用默认的换行符,则不能解析。我们可以将 \n 设置为字段分割符,将空行作为行分割符

1
2
3
4
gawk 'BEGIN{FS="\n"; RS=""} {print $1, $4}' data2 
# Riley Mullen (312)555-1234
# Frank Williams (317)555-9876
# Haley Snell (313)555-4938
Data variables

More gawk Built-In Variables

Variable Description
ARGC The number of command line parameters present
ARGIND The index in ARGV of the current file being proecssed
ARGV An array of command line parameters
CONVFMT The conversion format for numbers(see the printf statement), with a default value of %.6 g
ENVIRON An associative array of the current shell environment variables and their values
FNR The current record number in the data file
NF The total number of data fields in the data file
NR The number of input records processed

其他懒得打了

ARGC, ARGV 和之前 shell 变量概念很想,不过 ARGV 是不会将 script 统计在内的,这个有点不一样

1
2
gawk 'BEGIN{print ARGC, ARGV[0], ARGV[1]}' data1
# 2 gawk data1

PS: 表达式也有点不一样,变量前不需要加 $

获取环境变量

1
2
3
4
5
6
7
gawk '                                          
quote> BEGIN{
quote> print ENVIRON["HOME"]
quote> print ENVIRON["PATH"]
quote> }'
# /Users/i306454
# ...

FNR, NF, NR 可以标记 field 的位置。 NF 可以让你在不清楚 field 数量的情况下处理最后一个 field

1
2
3
4
5
6
7
gawk 'BEGIN{FS=":"; OFS="--"} {print $1,$NF}' /etc/passwd
# _nearbyd--/usr/bin/false
# ...
cat /etc/passwd | tail -3
# _coreml:*:280:280:CoreML Services:/var/empty:/usr/bin/false
# _trustd:*:282:282:trustd:/var/empty:/usr/bin/false
# _oahd:*:441:441:OAH Daemon:/var/empty:/usr/bin/false

FNR 表示当前 field 的序号. 下面的例子中,我们传入两个 data file, 每个文件处理完后 FNR 会重制

1
2
3
4
5
6
7
gawk 'BEGIN{FS=","}{print $1, "FNR="FNR}' data1 data1    
# data11 FNR=1
# data21 FNR=2
# data31 FNR=3
# data11 FNR=1
# data21 FNR=2
# data31 FNR=3

NR 则是将所有的传入数据一起统计的

1
2
3
4
5
6
7
8
9
10
gawk 'BEGIN{FS=","}                            
quote> {print $1, "FNR="FNR, "NR="NR}
quote> END{print "There were", NR, "records processed"}' data1 data1
# data11 FNR=1 NR=1
# data21 FNR=2 NR=2
# data31 FNR=3 NR=3
# data11 FNR=1 NR=4
# data21 FNR=2 NR=5
# data31 FNR=3 NR=6
# There were 6 records processed

单个文件处理时 FNR 和 NR 是一致的,多个文件处理时不一样

User-defined variables

gawk 自第一变量不能以数字开头,大小写敏感

Assigning variables in scripts

使用方式了 shell 中一致, 下面例子中我们将数字,字符赋值给变量,而且支持计算

1
2
3
4
5
6
7
8
9
10
11
gawk '             
quote> BEGIN{
quote> testing="This is a test"
quote> print testing
quote> testing=45
quote> print testing
quote> }'
# This is a test
# 45
gawk 'BEGIN{x=4; x=x*2+3; print x}'
# 11
Assigning variables on the command line

支持从终端接受参数的形式

1
2
3
4
5
6
7
8
9
10
11
12
cat script1               
# BEGIN{FS=","}
# {print $n}

gawk -f script1 n=2 data1
# data12
# data22
# data32
gawk -f script1 n=3 data1
# data13
# data23
# data33

开起来挺好的,但是这里有一个问题,终端传入的参数,BEGIN 里面是访问不到的

1
2
3
4
5
6
7
8
9
cat script2                    
# BEGIN{print "The starting value is", n; FS=","}
# {print $n}

gawk -f script2 n=3 data1
# The starting value is
# data13
# data23
# data33

你可以用 -v 参数解决这个问题

1
2
3
4
5
gawk -v n=3 -f script2 data1
# The starting value is 3
# data13
# data23
# data33

Working with Arrays

和很多其他语言一样,gawk 提供了 array 相关的功能,叫做 associative arrays. 个人感觉可以叫做 map, 行为方式是根据键拿值

Defining array variables

格式 var[index] = element

1
2
3
4
5
gawk 'BEGIN{
captial["Illinois"] = "Springfield"
print captial["Illinois"]
}'
# Springfield

对数字也有效

1
2
3
4
5
6
7
gawk 'BEGIN{
quote> var[1] = 34
quote> var[2] = 3
quote> total = var[1] + var[2]
quote> print total
quote> }'
# 37

Iterating through array variables

gawk 中遍历 map 的语法

1
2
3
4
for (var in arry)
{
statements
}

遍历 map 的示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
gawk 'BEGIN{
quote> var["a"] = 1
quote> var["g"] = 2
quote> var["m"] = 3
quote> var["u"] = 4
quote> for (test in var)
quote> {
quote> print "index:", test, " - value:", var[test]
quote> }
quote> }'
# index: u - value: 4
# index: m - value: 3
# index: a - value: 1
# index: g - value: 2

背后的实现和 hash 一样,不保证顺序

Deleting array variables

语法:delete array[index]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
gawk 'BEGIN{
var["a"] = 1
var["g"] = 2
for (test in var)
{
print "Index:", test, " - Value:", var[test]
}
delete var["g"]
print "---"
for (test in var)
print "Index:", test, " - Value:", var[test]
}'
# Index: a - Value: 1
# Index: g - Value: 2
# ---
# Index: a - Value: 1

PS: 原来他也支持单行 for 循环吗。。。

Using Patterns

本章介绍如何定义 pattern

Regular expressions

gawk 同时支持 BRE 和 ERE,正则要保证出现在 program script 之前

1
2
3
4
5
6
7
gawk 'BEGIN{FS=","} /11/{print $1}' data1    
# data11

gawk 'BEGIN{FS=","} /,d/{print $1}' data1
# data11
# data21
# data31

The matching operator

gawk 使用波浪线(~)表示匹配的动作,格式 $1 ~ /^data/. 下面的例子中,我们适配所有出现在 $2 这个位置上的 field, 以 data2 开头的即使我们寻在的目标

1
2
gawk 'BEGIN{FS=","} $2 ~ /^data2/{print $0}' data1
# data21,data22,data23,data24,data25

这个技巧在 gawk 中经常被用到, 下面是在 passwd 文件中寻找包含 root 关键字的行

1
2
3
gawk -F: '$1 ~/root/{print $1, $NF}' /etc/passwd               
# root /bin/sh
# _cvmsroot /usr/bin/false

这个操作还支持取反 $1 !~ /expression/

1
2
3
gawk 'BEGIN{FS=","} $2 !~ /^data2/{print $0}' data1
# data11,data12,data13,data14,data15
# data31,data32,data33,data34,data35

Mathematical expressions

gawk 还支持直接在表达式中做计算的, 下面的例子中我们统计 group 等于 0 的用户

1
2
gawk -F: '$4 == 0{print $1}' /etc/passwd           
# root

支持的算数表达式

  • x == y: Value x is equal to y
  • x <= y
  • x < y
  • x >=y
  • x > y

== 也可以用于文字表示式,但是表示的是精确匹配

1
2
3
4
gawk -F, '$1 == "data"{print $1}' data1  
# no match
gawk -F, '$1 == "data11"{print $1}' data1
# data11

Structured Commands

结构化脚本

The if statement

支持 if-then-else 语法

1
2
3
4
5
if (condition)
statement1

# 一行的格式也 OK
if (condition) statement1
1
2
3
4
5
6
7
8
9
cat data4 
# 10
# 5
# 13
# 50
# 34
gawk '{if ($1 > 20) print $1}' data4
# 50
# 34

如果 if 有多个条件,需要用花括号包裹

1
2
3
4
5
6
7
8
9
gawk '{                             
quote> if ($1 > 20)
quote> {
quote> x = $1 * 2
quote> print x
quote> }
quote> }' data4
# 100
# 68

带 else 的例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
gawk '{
quote> if ($1 > 20)
quote> {
quote> x = $1 * 2
quote> print x
quote> } else
quote> {
quote> x = $1 / 2
quote> print x
quote> }}' data4
# 5
# 2.5
# 6.5
# 100
# 68

也可以写在一行, 格式 if (condition) statement1; else statement2

1
2
3
4
5
6
gawk '{if ($1 > 20) print $1 * 2; else print $1 /2}' data4
# 5
# 2.5
# 6.5
# 100
# 68

The while statement

格式:

1
2
3
4
while (condition)
{
statements
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat data5 
# 130 120 135
# 160 113 140
# 145 170 215

gawk '{
total = 0
i = 1
while (i<4)
{
total += $i
i++
}
avg = total / 3
print "Average:", avg
}' data5
# Average: 128.333
# Average: 137.667
# Average: 176.667

支持 break,continue 打断循环

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
gawk '{
quote> total = 0
quote> i = 1
quote> while (i<4)
quote> {
quote> total += $i
quote> if (i == 2)
quote> break
quote> i++
quote> }
quote> avg = total/2
quote> print "The average of the first two data is:" , avg
quote> }' data5
# The average of the first two data is: 125
# The average of the first two data is: 136.5
# The average of the first two data is: 157.5

The do-while statement

格式:

1
2
3
4
do
{
statemnets
} while (condition)
1
2
3
4
5
6
7
8
9
10
11
12
gawk '{
quote> total = 0
quote> i = 1
quote> do
quote> {
quote> total += $i
quote> i++
quote> } while (total < 150)
quote> print total }' data5
# 250
# 160
# 315

The for statement

格式 for( variable assignment; condition; iteration process)

1
2
3
4
5
6
7
8
9
10
11
12
gawk '{
quote> total = 0
quote> for (i=1; i<4; i++)
quote> {
quote> total += $i
quote> }
quote> avg = total /3
quote> print "Average:", avg
quote> }' data5
# Average: 128.333
# Average: 137.667
# Average: 176.667

Formatted Printing

gawk 使用 printf 格式化输出 printf "format string", var1, var2...

format string 是格式化输出的关键,采用的 C 语言相同的 printf 功能。格式为 %[modifier]control-letter

Format Specifier Control Letters

Control Letter Desciption
c Displays a number as an ASCII character
d Displays an integer value
i Displays an integer value(same as d)
e Displays a number in scientific notation
f Displays a floating-point value
g Displays eigher scientific notation or floating point, whichever is shorter
o Displays an octal value
s Displays a text string
x Displays a hexadecimal value
X Displays a hexadecimal value, but using capital letters for A through F
1
2
3
4
5
gawk 'BEGIN{                                       
quote> x = 1 * 100
quote> printf "The answer is: %e\n", x
quote> }'
# The answer is: 1.000000e+02

出来上面的控制符外,printf 还提供了另外三个控制项

  • width, 控制宽度,小于设定值,给出空格补全,大于则用实际值覆盖
  • prec,
  • -(minus sign) 强制左对齐
1
2
3
4
5
6
7
8
9
gawk 'BEGIN{FS="\n"; RS=""} {print $1, $4}' data2
# Riley Mullen (312)555-1234
# Frank Williams (317)555-9876
# Haley Snell (313)555-4938

gawk 'BEGIN{FS="\n"; RS=""} {printf "%s %s \n", $1, $4}' data2
# Riley Mullen (312)555-1234
# Frank Williams (317)555-9876
# Haley Snell (313)555-4938

如果用 printf 需要自己打印换行符号,这种设定当你想讲多行数据放在一行的时候就很好使

1
2
gawk 'BEGIN{FS=","} {printf "%s ", $1} END{printf "\n"}' data1
# data11 data21 data31

下面的例子我们通过 modifier 格式化名字这个字段

1
2
3
4
gawk 'BEGIN{FS="\n"; RS=""} {printf "%16s %s\n ", $1, $4}' data2
# Riley Mullen (312)555-1234
# Frank Williams (317)555-9876
# Haley Snell (313)555-4938

默认是右对齐的,可以使用 minus sign 来左对齐

1
2
3
# gawk 'BEGIN{FS="\n"; RS=""} {printf "%-16s %s\n ", $1, $4}' data2Riley Mullen     (312)555-1234
# Frank Williams (317)555-9876
# Haley Snell (313)555-4938

格式化浮点类型

1
2
3
4
5
6
7
8
9
10
11
12
gawk '{                                                          
quote> total = 0
quote> for (i=0; i<4; i++)
quote> {
quote> total += $i
quote> }
quote> avg = total / 3
quote> printf "Average: %5.1f\n", avg
quote> }' data5
# Average: 171.7
# Average: 191.0
# Average: 225.0

Built-In Functions

gawk 提供了不少的内建函数帮助你完成一些特定功能。

Mathematical functions

The gawk Mathematical Functions

Function Description
atan2(x, y) The arctangent of x/y, with x and y specified in radians
cos(x) The cosine of x, with x specified in radians
exp(x) The exponential of x
int(x) The integer part of x, truncated toward 0
log(x) The natural logarithm of x
rand() A random floating point value larger thant 0 and less than 1
sin(x) The sine of x, with x specified in radians
sqrt(x) The square root of x
srand(x) Specifies a seed value for calculating random numbers

gawk 是有计算上线的,比如 exp(1000) 就会抛错

gawk 还提供了位运算

  • and(v1, v2)
  • compl(val) 补全
  • lshift(val, count) 左移
  • or(v1, v2)
  • rshift(val, count)
  • xor(v1, v2) 异或

String functions

支持一些常规的字符操作,比如排序,截取,匹配,分割等

Function Description
split(s, r [,a]) This function splits s into array a using the FS character, or the regular expression r if supplied. It returns the number of fields
1
2
3
gawk 'BEGIN{x="testing"; print toupper(x); print length(x)}'
# TESTING
# 7

sort 比较复杂, 下面的 asort 例子中,我们将原始 map 和输出结果 test 传给 asort 然后遍历打印 test. 打印时可以看到,原来的字母 index 被替换成了数字

1
2
3
4
5
6
7
8
9
10
11
12
13
gawk 'BEGIN{
var["a"] = 1
var["g"] = 2
var["m"] = 3
var["u"] = 4
asort(var, test)
for (i in test)
print "Index:", i, " - value:", test[i]
}'
# Index: 1 - value: 1
# Index: 2 - value: 2
# Index: 3 - value: 3
# Index: 4 - value: 4

下面是 split 的测试

1
2
3
4
5
6
7
8
9
10
11
cat data1
# data11,data12,data13,data14,data15
# data21,data22,data23,data24,data25
# data31,data32,data33,data34,data35
gawk 'BEGIN{FS=","}{
split($0, var)
print var[1], var[5]
}' data1
# data11 data15
# data21 data25
# data31 data35

Time functions

Function Description
mktime(datespec) Converts a date specified in the format YYYY NN DD HH MM SS[DST] into a timestamp value
strftime(format[,timestamp]) Formats either the current time of day timestamp, or timestamp if provided, into a formatted data and date, using the date() shell function format
systime() Returns the timestamp for the current time of day

时间函数在处理带时间相关的 log 文件时很有用

1
2
3
4
5
6
gawk 'BEGIN{        
quote> date = systime()
quote> day = strftime("%A, %B %d, %Y", date)
quote> print day
quote> }'
# Tuesday, June 15, 2021

User-Defined Functions

Defining a function

语法

1
2
3
4
function name([variables])
{
statements
}
1
2
3
4
function printthird()
{
print $3
}

允许返回值 return value

1
2
3
4
function myrand(limit)
{
return int(limit * rand())
}

Using your functions

当你定义一个函数的时候,它必须在最开始部分(before BEGIN).

1
2
3
4
5
6
7
8
9
10
11
12
gawk '      
quote> function myprint()
quote> {
quote> printf "%-16s - %s\n", $1, $4
quote> }
quote> BEGIN{FS="\n"; RS=""}
quote> {
quote> myprint()
quote> }' data2
# Riley Mullen - (312)555-1234
# Frank Williams - (317)555-9876
# Haley Snell - (313)555-4938

Creating a function library

  1. 为自定义还是创建库
  2. 将 gawk 脚本也存到文件中
  3. 在终端同时调用两个脚本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat funclib 
# function myprint()
# {
# printf "%-16s - %s\n", $1, $4
# }
# function myrand(limit)
# {
# return int(limit * rand())
# }
# function printthird()
# {
# print $3
# }
cat script4
# BEGIN{ FS="\n"; RS=""}
# {
# myprint()
# }
gawk -f funclib -f script4 data2
# Riley Mullen - (312)555-1234
# Frank Williams - (317)555-9876
# Haley Snell - (313)555-4938

Working through a Practical Example

When work with data files, the key is to first group related data records together and then perform any calculations required on the related data.

下面是一个保龄球得分统计的例子, 每一行分别包含 名字,组名,得分 的信息

1
2
3
4
5
cat scores.txt 
# Rich Blum,team1,100,115,95
# Barbara Blum,team1,110,115,100
# Christine Bresnahan,team2,120,115,118
# Tim Bresnahan,team2,125,112,116

目标:统计每个 team 的总分以及平均分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
cat bowling.sh                               
#!/usr/local/bin/bash

for team in $(gawk -F, '{print $2}' scores.txt | uniq)
do
gawk -v team=$team '
BEGIN{ FS=","; total=0 }
{
if ($2==team)
{
total += $3 + $4 + $5;
}
}
END {
avg = total/6;
print "Total for", team, "is", total, ", the average is",avg
}
' scores.txt
done

./bowling.sh
# Total for team1 is 635 , the average is 105.833
# Total for team2 is 706 , the average is 117.667

计算方法:先遍历文件,取得所有的组名,然后再每个组名遍历一遍文件统计一次,打印一次。一开始我还以为可以一次对结果做分类统计的,如果是每次循环的话,还是很容易理解的

Working with Alternative Shells

介绍除 bash 外其他一些常见的 shell, 暂时不关心,pass

Chapter 1: Starting with Linux Shells

What Is Linux

Linux 系统主要由一下 4 部分组成

  • The Linux kernel
  • The GUN utilities
  • A graphical desktop environment
  • Application software

Looking into the Linux Kernel

Linux 系统的核心就是 kernel,它起到调度硬件软件资源的作用。

kernel 有四个主要的功能

  • System memory management
  • Software program management
  • Hardware management
  • Filesystem management
System Memory management

下面的内容和操作系统相关,很多概念我都不是很感兴趣,可以先跳过

Chapter 2: Getting to the Shell

终端介绍,跳过

Chapter 3: Basic bash Shell Commands

Interacting with the bash Manual

man page 的结构如下

Section Description
Name Displays command name and a short description
Syopsis Shows command syntax
Configuration Provides configuration information
Description Describes command generally
Options Describes command option(s)
Exit Status Defines command exit status indicator(s)
Return Value describes command return value(s)
Errors Provides command return value(s)
Environment Describes envrionment variable(s) used
Files Defines files used by command
Versions Describes command version information
Conforming To Provides standards followed
Notes Describes additional helpful command material
Bugs Provides the location to report found buds
Example Shows command use examples
Authors Provides information on command developers
Copyright Defines command code copyright status
See Also Refers similar available commands

常见的目录及用途

Directory Usage
/ root of the virtual directory, where normally, no files are placed
/bin binary directory, where GNU user-level utilities are stored
/boot boot directory, where boot files are stored
/dev device directory, where Linux creates device nodes
/etc system configuration files directory
/home home directory, where Linux creates user directories
/lib library directory, where system and application library files are stored
/media media directory, a common place for mount points used for removable media
/mnt mount directory, another common place for mount points used for removable media
/opt optional directory, often used to store third-part software packages and data files
/proc process directory, where current hardware and process information is stored
/root root home directory
/sbin system binary directory, where many GNU admin-level utilities are stored
/run run directory, where runtime data is held during system operation
/srv service directory, where local services stre their files
/sys system directory, where system hardware information files are stored
/tmp temporary directory, where temporary work files can be crated and destroyed
/usr user binary directory, where the bulk of GUN user-level utilities and data files are stored
/var variable directory, for files that change frequently, such as log files

Listing Files and Directories

Displaying a basic list

展示文件命令 ls

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 如果终端没有配置颜色,使用 -F 区分文件和文件夹,可执行文件后会加 *
ls -F
# test3b.sh* tmp_folder/

# -a 显示所有文件,包括隐藏文件
ls -a
# . npm Documents

# -R 循环显示子目录
ls -F -R
# badtest* nohup.out search.xml search.xml.bak tmp_folder/
#
# ./tmp_folder:
# test1.sh* test10b.out

Displaying a long listing

1
2
3
4
5
# Displaying a long list
ls -l
# total 10112
# -rwxr--r-- 1 i306454 staff 159 May 30 15:56 badtest
# -rw------- 1 i306454 staff 138 Jun 2 19:12 nohup.out

long list 显示格式说明

  • The file type, directory(d), regular file(-), linked file(l), character device(c) or block device(b)
  • The file permissions
  • The number of file hard links
  • The file owner username
  • The file primary group name
  • The file byte size
  • The last time file was modified
  • The filename or directory name

long list 是一个比较强力的模式,你可以收集到很多信息

Filtering listing output

过滤文件

1
2
ls -l bad*
# -rwxr--r-- 1 i306454 staff 159 May 30 15:56 badtest

可用的过滤符

  • ? 单个字符
  • * 多个字符
  • [] 多选,可以是 [ai], [a-i],[!a]

使用星号的过滤方法也叫做 file globbing

Handling Files

过一下常用的文件处理命令

Creating files

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# touch 创建空文件
touch test_one
ls -l test_one
# -rw-r--r-- 1 i306454 staff 0 Jun 4 11:25 test_one

# 可以在不改变文件内容的情况下更新最后改动时间,这个之前倒是不知道
ls -l test_one
# -rw-r--r-- 1 i306454 staff 3 Jun 4 11:27 test_one
touch test_one
ls -l test_one
# -rw-r--r-- 1 i306454 staff 3 Jun 4 11:29 test_one

# -a 只修改最近访问时间
ls -a test_one
# 不过 Mac 不支持这个参数

Copying files

format: cp source destination, copy 的文件是一个全新的文件

1
2
3
4
5
6
7
8
9
10
# -i 当文件已经存在时,询问是否覆盖
cp -i test_one test_two
# overwrite test_two? (y/n [n]) n
# not overwritten

# -d 只显示文件夹,不显示文件夹内容
ls -Fd tmp_folder
# tmp_folder/
ls -F tmp_folder/
# test1.sh* test10b.out...

Linking files

Linux 中你的文件可以有一个物理主体和多个虚拟链接,这种链接即为 links。系统中有两种链接

  • A symbolic link
  • A hard link

A symbolic link is simply a physical file that points to another file somewhere in the virtual directory structure. The two symnolically linked together files do not share the same contents.

1
2
3
4
5
6
7
8
ln -s test_one  sl_test_one
ls -l *test_one
# lrwxr-xr-x 1 i306454 staff 8 Jun 4 12:30 sl_test_one -> test_one
# -rw-r--r-- 1 i306454 staff 3 Jun 4 11:29 test_one

# -i 显示 inode 名字
ls -i *test_one
51540816 sl_test_one 51538439 test_one

hard link 是一个虚拟链接,你可以通过它对原文件做修改

1
2
3
4
5
ln test_two hl_test_two

ls -il *test_two
# 51538882 -rw-r--r-- 2 i306454 staff 3 Jun 4 11:38 hl_test_two
# 51538882 -rw-r--r-- 2 i306454 staff 3 Jun 4 11:38 test_two

Note 创建 hard link 要求你创建的地方是同一个物理空间,如果是分开的空间,只能创建 symblic link.

查阅下来发现,符号链接和硬链接最主要的区别有

  • symbolic link 和原文件有不同的 inode, hard link 和原文件相同
  • hard link 启动备份的作用,当所有指向同一个 inode 的文件都删除了文件才删除
  • symbolic link 保存原文件路径,当原文件删除了,link 的文件内容就消失了

Renaming files

Renaming files is called moving files. mv won’t change the inode number.

1
2
3
4
touch file1
mv file1 file2
ls file*
# file2

mv 也支持整个文件夹的迁移,且不需要加任何参数

Deleting files

1
2
rm -i file2
# remove file2?

Managing Directories

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
mkdir New_Dir

# 创建多级文件夹
mkdir -p folder1/folder2/folder3
ls -R folder1
# folder2

# folder1/folder2:
# folder3

# folder1/folder2/folder3:

# rmdir 只能删除空文件夹
rmdir folder1
# rmdir: folder1: Directory not empty

rm -rf folder1

Viewing File Contents

使用 file 瞥一眼文件

1
2
3
4
5
6
7
8
file folder1
# folder1: directory
file file2
# file2: empty
ile search.xml
# search.xml: XML 1.0 document text, UTF-8 Unicode text, with very long lines, with overstriking
file badtest
# badtest: Bourne-Again shell script text executable, ASCII text

cat 全揽文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
cat tree.txt

# -n 显示行号
cat -n badtest
# 1 #!/usr/local/bin/bash
# 2 # Testing closing file descriptors
# 3
# 4 exec 3> test17file
# 5
# 6 echo "This is a test line of data" >&3
# 7
# 8 exec 3>&-
# 9
# 10 echo "This won't work" >&3
# 11
# 12
# 13

# 只显示 non-blank 的行号
cat -b badtest
# 1 #!/usr/local/bin/bash
# 2 # Testing closing file descriptors

# 3 exec 3> test17file

# 4 echo "This is a test line of data" >&3

# 5 exec 3>&-

# 6 echo "This won't work" >&3

# 使用 ^I 代替 tab, Mac 不支持
cat -T badtest

Using the more command

cat 只能全文显示,more 显示一部分并让你自己选择后面的动作

Using the less command

别被它的名字骗了,其实他是 more 的增强版 for phrase ‘less is more’

Viewing parts of a file

tail 默认只显示文件的最后 10 行,-n 指定显示行数 tail -n 2 file

head 默认显示开头 10 行,- 5 指定行数 head -3 file. 格式和 tail 不统一,真是干了

试了下,这两个命令都可以用 -n 3-3 的格式,没区别

Chapter 4: More bash Shell Commands

Monitoring Programs

Peeking at the processes

Linux 系统中,用 process 表示运行着的系统。可以用 ps(process status) 命令查看.

默认情况下,只显示四个内容,process ID,terminal that they are running from, and the CUP time the process has used.

1
2
3
4
5
6
7
ps 
PID TTY TIME CMD
647 ttys000 0:02.15 -zsh
12163 ttys000 11:26.77 /Users/i306454/SAPDevelop/tools/sapjvm_8/bin/java -Dlog4j.co
12183 ttys000 0:01.12 tail -f /Users/i306454/SAPDevelop/workspace/trunk/tomcat-sfs
1238 ttys001 0:09.65 /bin/zsh -l
9378 ttys002 0:03.04 /bin/zsh --login -i

ps command 有三种类型的参数

  • Unix style parameters
  • BSD style parameters
  • GNU long parameters

Unix-style parameters

简单摘录几个, 而且书上列的只是一部分,主要记住几个常用的就行了

Parameter Description
-A Shows all processes
-N Shows the opposite of the specified parameters
-a Shows all processes except session headers and processes without a terminal
-d Shows all processes except session headers
-e Shows all processes
-f Displays a full format listing
-l Displays a long listing
1
2
3
4
5
6
7
8
9
ps -ef | head
# UID PID PPID C STIME TTY TIME CMD
# 0 1 0 0 10:09AM ?? 1:23.22 /sbin/launchd
# 0 64 1 0 10:09AM ?? 0:03.00 /usr/sbin/syslogd
# ...

ps -l | head
# UID PID PPID F CPU PRI NI SZ RSS WCHAN S ADDR TTY TIME CMD
# 501 647 646 4006 0 31 0 5457412 5200 - S+ 0 ttys000 0:04.03 -zsh
  • UID: The user responsible for launching the process
  • PID: The process ID of the process
  • PPID: The PID of the parent process(if a process is start by another process)
  • C: Processor utilization over the lifetime of the process
  • STIME: The system time when the process started
  • TTY: The termnal device from which the porcess was launched
  • TIME: The cumulative CUP Time required to run the process
  • CMD: The name of the program that wat started
  • F: System flags assigned to the process by the kernel
  • S: The state of the process. O-running on proecssor; S-sleeping; R-runnable, waiting to run; Z-zombie, process terminated but parent not availale; T-process stopped;
  • PRI: The priority of the process(higher numbers mean low priority)
  • NI: The nice value, which is used for determining priorites
  • ADDR: The memory address of the process
  • SZ: Approximate amount of swap space required if the process was swapped out
  • WCHAN: Address of the kernel function where the process is sleeping

其他两种我很少用,先留着把,有机会再补全

Real-time process monitoring

ps 只能显示一个时间点的 process 状态,如果要实时显示,需要用到 top 命令

1
2
3
top
# PID COMMAND %CPU TIME #TH #WQ #PORT MEM PURG CMPRS PGRP PPID STATE BOOSTS %CPU_ME %CPU_OTHRS UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW PAGEIN IDLEW POWE INSTRS CYCLES USER
# 1841 com.docker.h 36.7 68:12.08 13 0 37 19G 0B 629M 1710 1830 sleeping *0[1] 0.00000 0.00000 501 202373842+ 473 569 335 85444924+ 920 48973831+ 17 3949456+ 59.4 469486491 786452501 i306454

Stopping processes

Linux 系统中使用 signals 来和其他 process 交互。常用的 signals 列表

Signal Name Description
1 HUP Hangs up
2 INT Interrupts
3 QUIT Stops running
9 KILL Unconditionally terminates
11 SEGV Produces segment violation
15 TERM Terminates if possible
17 STOP Stops unconditionally, but doesn’t terminate
18 TSTP Stops or pauses, but continues to run in background
19 CONT Resumes execution after STOP or TSTP

The kill command 只有 process 的 owner 或者 root user 有权限杀死进程。 格式:kill 3904

The killall command 可以根据名字关闭多个进程 killall http*

Monitoring Disk Space

Mounting media

1
2
3
4
5
# 显示当前挂在的设备
mount
# /dev/disk1s1s1 on / (apfs, sealed, local, read-only, journaled)
# devfs on /dev (devfs, local, nobrowse)
# ...

显示信息:

  • The device filename of the media
  • The mount point in the virtual directory where the media is mounted
  • The filesystem type
  • The access status of the mounted media

手动挂载,你需要是 root 或者用 sudo,格式为 mount -t type device directory, sample mount -t vfat /dev/sdb1 /media/disk

type 指定了设备的文件类型,如果你想要和 Windows 下共享这个设备,你最好使用下面这些文件类型

  • vfat: Windows long filesystem
  • ntfs: Windows advanced filesystem used in Windows NT, XP and Vista
  • iso9660: The standard CD-ROM filesystem

unmount [directory | device] 解绑,如果解绑时有 process 还在这个设备上运行,系统会阻止你

Using the df command

当你想要看看磁盘还有多少可用空间时。。。

df command allows you to easily see what’s happening on all the mounted disks

df - display free disk space

1
2
3
4
5
df -h
# Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
# /dev/disk1s1s1 932Gi 14Gi 749Gi 2% 553757 9767424403 0% /
# devfs 190Ki 190Ki 0Bi 100% 656 0 100% /dev
# ...

Using the du command

df 是查看磁盘细心, du 是查看磁盘下的文件信息

The du command shows the disk usage or a specific directory(by default, the current directory)

du - display disk usage statistics

1
2
3
4
5
6
# 说是文件也会显示,问什么我这里看不到。。。
du .
# 104 ./tmp_folder
# 0 ./folder1/folder2/folder3
# 0 ./folder1/folder2
# 0 ./folder1

一些可选参数

  • -c: 统计结果
  • -h: 方便阅读的结果
  • -s: Summarizes each argument

Working with Data Files

列出一些处理大量数据时用到的工具

Sorting data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
cat file1
# one
# two
# three
# four
# five
sort file1
# five
# four
# one
# three
# two
cat file2
# 1
# 2
# 100
# 45
# 3
# 10
# 145
# 75
sort file2
# 1
# 10
# 100
# 145
# 2
# 3
# 45
# 75

# 使用 -n 指定数字排序
sort -n file2
# 1
# 2
# 3
# 10
# 45
# 75
# 100
# 145

cat file3
# Apr
# Aug
# Dec
# Feb
# Jan
# Jul
# Jun
# Mar
# May
# Nov
# Oct
# Sep

# 按月份排序
sort -M file3
# Jan
# Feb
# Mar
# Apr
# May
# Jun
# Jul
# Aug
# Sep
# Oct
# Nov
# Dec

其他比较常见的参数

  • -t 指定分割符
  • -k 指定排序的列
  • -r 倒序
1
2
3
4
5
6
# 当前文件夹下的文件倒序排列
du -sh * | sort -nr
# 52K tmp_folder
# 4.0K tree.txt
# 4.0K test_thr
# ...

Searching for data

grep [options] patttern [file]

一些有趣的可选参数

  • -v 挑选不 match 的那些
  • -n 行号
  • -o 只显示配的内容
  • -c 显示匹配的数量
  • -e 多个匹配 grep -e t -e f file1
  • 使用正则 grep [tf] file1

Compressing data

Linux 系统中的压缩工具

Utility File Extension Description
bzip2 .bz2 Uses the Burrows-Wheeler block sorting text compression algorith and Fuffman coding
compress .Z Original Unix file compression utility; starting to fade away into obscurity
gzip .gz The GUN Project’s compression utility; uses Lempel-Ziv coding
zip .zip The Unix version of the PKZIP program for Windows

gzip 是 Linux 中使用度最高的压缩工具,它由三部分组成

  • gzip for compressing files
  • gzcat for displaying the contents of compressed text files
  • gunzip for uncompressing files
1
2
3
4
5
6
7
8
9
gzip file1
ls -l file1*
# -rw-r--r-- 1 i306454 staff 50 Jun 4 13:57 file1.gz
gzip file*
# gzip: file1.gz already has .gz suffix -- unchanged
ls -l file*
# -rw-r--r-- 1 i306454 staff 50 Jun 4 13:57 file1.gz
# -rw-r--r-- 1 i306454 staff 46 Jun 4 13:58 file2.gz
# -rw-r--r-- 1 i306454 staff 34 Jun 4 14:02 file3.gz

Archiving data

虽然 zip 挺好用,但是 Linux 上用的最多的还是 tar command. tar 本来是用来归档到 tape device 的,但是它也能用来归档到文件,后来还变得越来越受欢迎了

tar function [options] object1 object2

The tar Command Functions

Function Long Name Description
-A –concatenate Appends an existing tar archive file to another existing tar archive file
-c –create Create a new tar archive file
-d –diff Checks the differences between a tar archive file and the filesystem
–delete Deletes from an existing tar archive file
-r –append Appends files to the end of an existing archive file
-t –list Lists the contents of an existing tar archive file
-u –update Appends files to an existing tar archive file that are newer than a file with the same name in the existing archive
-x –extract Extract files from an existing archive file

The tar Command Options

Option Description
-C dir Changes to the specified directory
-f file Output results to file(or device)
-j Redirects output to the bzip2 command for compression
-p Preserves all file permissions
-v Lists files as they are processed
-z Redirects the output to the gzip command for compression
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# -c create new tar file
# -v list process file
# -f output result to file
tar -cvf test.tar tmp_folder/
# a tmp_folder
# a tmp_folder/test11.sh
# a tmp_folder/test2.sh
# a ...

ls test*
# test.tar

# 并不会解压,只是看看
# -t list contents in tar
tar -tf test.tar
# tmp_folder/
# tmp_folder/test11.sh
# tmp_folder/test2.sh
# ...

# 解压
tar -xvf test.tar
# x tmp_folder/
# x tmp_folder/test11.sh
# ...

Tip 网上下的包很多都是 .tgz 格式的,是 gzipped tar files 的意思,可以用 tar -zxvf filename.tgz

Chapter 5: Understanding the Shell

这章将学习一些 shell process 相关的知识,子 shell 和 父 shell 的关系等

Exploring Shell Types

你默认启动的 shell 是配置在 /etc/passwd 文件中的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat /etc/passwd
# ...
# root:*:0:0:System Administrator:/var/root:/bin/sh
# ...
ls -lF /bin/sh
# -rwxr-xr-x 1 root wheel 120912 Jan 1 2020 /bin/sh*

# 其他一些自带的 sh
ls -lF /bin/*sh
# -r-xr-xr-x 1 root wheel 1296704 Jan 1 2020 /bin/bash*
# -rwxr-xr-x 1 root wheel 1106144 Jan 1 2020 /bin/csh*
# -rwxr-xr-x 1 root wheel 277440 Jan 1 2020 /bin/dash*
# -r-xr-xr-x 1 root wheel 2585424 Jan 1 2020 /bin/ksh*
# -rwxr-xr-x 1 root wheel 120912 Jan 1 2020 /bin/sh*
# -rwxr-xr-x 1 root wheel 1106144 Jan 1 2020 /bin/tcsh*
# -rwxr-xr-x 1 root wheel 1347856 Jan 1 2020 /bin/zsh*

Exploring Parent and Child Shell Relationships

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ps -f               
# UID PID PPID C STIME TTY TIME CMD
# 501 667 665 0 10:10AM ttys000 0:03.74 -zsh
# 501 1454 1433 0 10:10AM ttys001 0:00.99 /bin/zsh -l
# 501 2027 637 0 10:11AM ttys002 0:00.38 /bin/zsh --login -i

# 在 zsh 中启动一个 bash
bash

ps -f
# UID PID PPID C STIME TTY TIME CMD
# 501 667 665 0 10:10AM ttys000 0:03.74 -zsh
# 501 1454 1433 0 10:10AM ttys001 0:01.04 /bin/zsh -l
# 501 12146 1454 0 4:07PM ttys001 0:00.01 bash
# 501 2027 637 0 10:11AM ttys002 0:00.38 /bin/zsh --login -i
# 可以看到新建了一个 bash process, PPID 是 /bin/zsh 的地址

上面的例子中,bash 就是 zsh 的子 shell, 他会复制一部分父 shell 的环境变量,这里会导致一些小问题,第6章会介绍。子 shell 也叫 subshell. subshell 可以再建 subshell. ps --forest 可以显示树桩结构,不过貌似 mac 不支持

Looking at process lists

一行运行多个 cmd, 使用 semicolon 分割 pwd ; ls ; cd /etc ; pwd ; cd ; pwd ; ls 但是它并不是一个 process,将它用括号包裹之后,会启动 subshell 运行它 (pwd ; ls ; cd /etc ; pwd ; cd ; pwd ; ls) 和这个语法相似的还有 { command; } 这个不会启动 subshell. 可以通过打印 $BASH_SUBSHELL 变量来验证

1
2
3
4
5
6
7
8
(pwd ; ls ; cd /etc ; pwd ; cd ; pwd ; ls ; echo $BASH_SUBSHELL)
# ...
# 1
pwd ; ls ; cd /etc ; pwd ; cd ; pwd ; ls ; echo $BASH_SUBSHELL
# 0
(pwd; (echo $BASH_SUBSHELL))
# /Users/i306454
# 2

Creatively using subshells

Investigation background mode

slepp - 等待 x 秒

1
2
3
4
5
6
# & 符号设置后台运行
sleep 3000 &
# [1] 12603
ps
# 12391 ttys001 0:00.03 bash
# 12603 ttys001 0:00.00 sleep 3000

Putting process lists into the background

a process list is a command or series of commands executed within a subshell.

1
2
3
4
5
6
7
8
9
(sleep 2 ; echo $BASH_SUBSHELL ; sleep 2)
# 1
(sleep 2 ; echo $BASH_SUBSHELL ; sleep 2) &
# [2] 12658
ps
# 12658 ttys001 0:00.00 bash
# 1

# [2]+ Done ( sleep 2; echo $BASH_SUBSHELL; sleep 2 )

background 运行脚本 not have your terminal tied up with subshell’s I/O

sleep 和 echo 的 sample 只是示范,工作中,你可能会后台执行 tar (tar -cf Rich.tar /home/rich ; tar -cf My.tar /home/christine)&

Looking at co-processing

Co-processing does two thins at the same time. coproc 会起一个后台的 job 运行对应的命令

1
2
3
4
5
coproc sleep 10
# [2] 12746
jobs
# [1]- Running sleep 3000 &
# [2]+ Done coproc COPROC sleep 10

默认的 coproc 起的 job 名字为 COPRO, 你也可以指定名字, curly bracket({) 换括号后面要接空格,语法规定。一般用默认的名字就行,只有当你需要和他们通信时,才会特别的取一个名字

1
2
3
4
coproc My_Job { sleep 10; }
# [2] 12848
jobs
# [2]+ Running coproc My_Job { sleep 10; } &

后面还跟了一个 ps --forest 的实验,没法做 ╮( ̄▽ ̄””)╭

Just remember taht spawning a subshell can be expensive and slow. Creating nested subshells is even more so!

Understanding Shell Built-In Commands

Built-in commands and non-built-in commands

Looking at external commands

external command 也被叫做 filesystem command, 是在 bash shell 之外的,通常放在 /bin, /usr/bin, /sbin 或者 /usr/sbin

ps 就是一个 external 的 command

1
2
3
4
which ps 
# /bin/ps
type -a ps
# ps is /bin/ps

每当 external command 执行时,都会创建一个 child process, 这种行为叫做 forking.

Looking at built-in comands

Built-in commands 不需要 child process 就能执行。他们是 shell 工具集的一部分。

1
2
3
4
type exit
# exit is a shell builtin
type cd
# cd is a shell builtin

他们不需要 fork 或者运行文件,所以他们更快,效率更高。

有些 cmd 有两个版本,which 只会显示 external command

1
2
3
4
5
type -a echo 
# echo is a shell builtin
# echo is /bin/echo
which echo
# /bin/echo

Using the history command

显示 cmd 的历史记录

1
2
3
4
5
history 
# ...
# 42 code test19
# 43 ./test19
# ...

Tip 设置环境变量 HISTSIZE 改变数量上限

使用 !! 执行上一条命令, bash 的历史记录会存在 .bash_history 文件中,当前 shell 的历史存在内存中,退出后存到文件中,通过 history -a 强制立刻写入文件

Using command aliases

为了简化输入,有了别名(alias)。

1
2
3
4
5
6
7
# 显示自带的别名
alias -p

alias li='ls -li'
li
# total 8
# 5091820 drwx------@ 3 i306454 staff 96 Aug 20 2020 Applications

Using Linux Environment Variables

Environment variables are set in lots of places on the Linux system, and you should know where these places are.

这章将介绍环境变量存储的位置,怎么创建自己的环境变量,还介绍怎么使用 variable arrays.

Exploring Environment Variables

bash shell 使用 environment variable 存储 shell session 和 工作环境相关的信息。环境变量分两种

  • Gloabl variables
  • Local variables

Looing at global environment variables

Gloabl variables 是所有 shell 都可见的,Local variables 是当前 shell 才可见的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 查看 gloabl variables
printenv
# SHELL=/bin/zsh
# LSCOLORS=Gxfxcxdxbxegedabagacad
# PIPENV_VENV_IN_PROJECT=1
# ...

# 输出单个变量
printenv HOME
# /Users/i306454

# env 貌似不能输出单个变量
env HOME
# env: HOME: No such file or directory

# 还可以用 echo
echo $HOME
# /Users/i306454

Looing at local environment variables

Linux 默认为每个 shell 定义基本的 local variables, 当然你也可以自定。系统中并没有输出本地变量的命令,但是有 set 可以输出 global + local

1
2
3
4
set
# '!'=0
# '#'=0
# '$'=13377

env vs printenv vs set:

  • set = global + local + user-defined variables, result is sorted
  • env has additional functionality that printenv not have

Setting User-Defined Variables

Setting local user-defined variables

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
echo $my_var

my_var=Hello
echo $my_var
# Hello

# 包含空格的,需要用单/双引号包裹
my_var=Hello world
# bash: world: command not found
my_var='Hello world'
echo $my_var
# Hello world

# 新启一个 bash, 访问不到之前定义的 local variable
bash
echo $my_var
#

user-defined local varibale 使用小写,global 的使用大写。Linux 中的变量是区分大小写的。

Setting global environment variables

创建 gloabl variable 的方法:先创建一个 local variable,然后 export 成一个 global environment

1
2
3
4
5
my_var="I am Gloabl now"
export my_var
bash
echo $my_var
# I am Gloabl now

但是,在 child shell 中修改 global variable 并不会影响到 parent shell 中的值,这个好神奇, 即使用 export 在 subshell 中修改也不行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
my_var="Null"
echo $my_var
# Null
exit
# exit
echo $my_var
# I am Gloabl now

bash
export my_var="Null"
echo $my_var
# Null
exit
# exit
echo $my_var
# I am Gloabl now

Removing Environment Variables

使用 unset

1
2
3
4
5
echo $my_var
# I am Gloabl now
unset my_var
echo $my_var
#

Tip 当你向对变量做什么的时候,不需要加 $, 当你想要用变量做什么的时候,需要加 $. printenv 除外。

和之前的规则一样,当你在 subshell 中 unset 一个 global variable 时,这个 unset 只在 subshell 中生效,parent shell 中变量还是存在的

Uncovering Default Shell Environment Variables

Bash shell 除了自己定义一些环境变量外,还从 Unix Bourne shell 那边继承了一下变量过来。

The bash Shell Bourne Variables

Variable Description
CDPATH A colon-separated list of directories used as a search path for the cd command
HOME The current user’s home directory
IFS A list of characters that separate fields used by the shell to split text strings
MAIL The filename fo the current user’s mailbox(The bash shell checks this file for new mail.)
MAILPATH A colon-separated list of multiple filenames for the current user’s mailbox(The bash shell checks each file in this list for new mail.)
OPTARG The value of the last option argument processed by the getopt command
OPTIND The index value of the last option argument processed by the getopt command
PATH A colon-separated list of directories where shell looks for commands
PS1 The primary shell command line interface prompt string
PS2 The sceondary shell command line interface prompt string

除了这些,bash shell 还提供了一些自定义的变量, 太长了,不列了。

Setting the PATH Environment Variable

当你在终端输入一个 external command 时,系统就会根据 PATH 中的路径找命令. 路径用冒号分割。

1
2
3
4
5
echo $PATH
# /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# 添加路径
PATH=$PATH:/home/jack/Scripts

Tips 如果 subshell 中也要用到新加的路劲,你就要 export 它。有一个技巧是,可以在 PATH 中添加当前路径 PATH=$PATH:.

Locating System Environment Variables

前面我们介绍了如何使用这些变量,那么怎么将他们做持久化呢。当你启动一个 shell 的时候,系统会到 setup file or environment files 里面去加载这些变量。

你可以通过三种方式启动一个 bash shell:

  • As a default login shell at login time
  • As an interactive shell that is started by spawning a subshell
  • As a non-inactive shell to run a script

Understanding the login shell process

当你登陆系统的时候,bash shell 开启了一个 login shell. login shell 会从一下五个文件中加载配置:

  • /etc/profile
  • $HOME/.bash_profile
  • $HOME/.bashrc
  • $HOME/.bash_login
  • $HOME/.profile

/etc/profile 是所有用户登陆时都会执行的文件,其他的几个就是用户可以自定一的。

1
2
3
4
5
6
7
8
9
10
11
cat /etc/profile
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
export PAGER=less
export PS1='\h:\w\$ '
umask 022

for script in /etc/profile.d/*.sh ; do
if [ -r $script ] ; then
. $script
fi
done

脚本中循环处理 profile.d 文件夹下的内容,这个文件夹是专门用来放一下 application-specific startup file taht is executed by the shell when you log in.

1
2
3
4
ls -lF /etc/profile.d/
# total 8
# -rw-r--r-- 1 root root 295 May 30 2020 color_prompt
# -rw-r--r-- 1 root root 61 May 30 2020 locale.sh

自定义位置文件加载顺序如下,第一个被找到后,其他的就不加载了

  • $HOME/.bash_profile
  • $HOME/.bash_login
  • $HOME/.profile

.bashrc 不在其中,因为它会在其他 process 中被调用

Understanding the interactive shell process

当你在终端输入 bash 时,你会启动一个 interactive shell. 当你启动 interactive shell 的时候,它不会加载 /etc/profile 中的内容。它只会 check .bashrc 中的配置。

.bashrc 做两件事

  1. check for a common bashrc file in /etc directory
  2. provides a place for user to enter personal command alias + provide script functions

Understanding the non-interactive shell process

没遇到过使用场景,先 pass

Making environment variables persistent

将自定义的变量存在 $HOME/.bashrc 是一个极好的习惯

Learning about Variable Arrays

1
2
3
4
5
6
7
8
9
10
11
12
13
# 定义数组 
mytest=(one two three four five)
echo $mytest
# one
echo ${mytest[2]}
# three

echo ${mytest[*]}
# one two three four five

mytest[2]=seven
echo ${mytest[*]}
one two seven four five

可以通过 unset 移除某个元素,但是移除之后,print 不会显示,但是它位置还是占着的

1
2
3
4
5
6
7
8
9
10
unset mytest[2]
${mytest[*]}
# one two seven four five
echo ${mytest[2]}

echo ${mytest[3]}
# four
unset mytest
bash-5.1$ echo ${mytest[*]}

有时候 arrays 的使用挺复杂的,一般我们不再脚本中使用它,而且兼容性也不是很好。

Understanding Linux File Permissions

Linux Security

Linux 系统的 security 核心是 account 这个概念。每个访问的用户都有一个唯一的账户,权限就是根据账户设置的。下面将介绍一些账户相关的文件和工具包。

The /etc/passwd file

/etc/passed 文件中存储这一些 UID 相关的信息,root 是管理员账户,有固定的 UID 0.

1
2
3
4
cat /etc/passwd
# root:*:0:0:System Administrator:/var/root:/bin/sh
# daemon:*:1:1:System Services:/var/root:/usr/bin/false
# _uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico

系统还会为一些非用户的 process 创建 account,这些叫做 system accounts.

All services that run in background mode need to be logged in to the Linux system under a system user account.

Linux 将 500 以下的 UID 预留给了 system accounts.

passwd 文件中的信息包括

  • The login name
  • The password for the user
  • The numberical UID of the user account
  • The numberical group ID(GID) of the user account
  • A text description of the user account(called the comment field)
  • The location of the HOME directory for the user
  • The default shell for the user

password 字段为 x, 以前放置的还是加密后的 pwd 后来为了安全统一放到 /etc/shadow 下面去了

The /etc/shadow file

只有 root user 可以访问 shadow 文件

1
2
3
4
5
6
7
# docker bash 中的内容
cat /etc/shadow
# root:!::0:::::
# bin:!::0:::::

# 书上的例子
# rich:$1$.FfcK0ns$f1UgiyHQ25wrB/hykCn020:11627:0:99999:7:::

shadow 中的信息包括

  • The login name corresponding to the login name in the /etc/passwd file
  • The encrypted password
  • The number of days since January 1, 1970, that the password was last changed
  • The minimun number of days before the password can be changed
  • The number of days before the password must be changed
  • The number of days before the password expiration that the user is warned to change the password
  • The number of days after a password expires before the account will be disabled
  • The date(stored as the number of days since January 1, 1970) since the user account was disabled
  • A filed reserved for feature use

Adding a new user

1
2
3
4
5
6
7
8
9
# 查看 useradd 的默认配置
useradd -D
# GROUP=100
# HOME=/home
# INACTIVE=-1
# EXPIRE=
# SHELL=/bin/sh
# SKEL=/etc/skel
# CREATE_MAIL_SPOOL=no

当你在 useradd 的时候没有指定任何参数的时候,就会按照这个 default 的配置添加新用户。default 包含以下信息

  • The user is added to a common group with group ID 100
  • The new user has a HOME account created in the directory /home/loginname
  • The account can’t be disabled when the password expires
  • The new account can’t be set to expire at a set date
  • The new account users the bin sh as the default shell
  • The system copies the contents of the /etc/skl directory to the user’s HOME directory
  • The system creates a file in the mail directory for the user account to receive mail

倒数第二个 item 说的是,在创建用户的时候,admin 可以预先设置一个模版

默认情况下,useradd 并不会为用户创建 HOME 目录,需要添加 -m 参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
ls -l /etc/skel
# total 16
# drwxr-xr-x+ 2 root root 4096 Aug 11 2017 Desktop
# -rw-r--r-- 1 root root 8980 Apr 20 2016 examples.desktop

useradd -m I306454
ls -l /home
# total 28
# drwxr-xr-x+ 5 fuser fuser 4096 Aug 11 2017 fuser
# drwxr-xr-x+ 4 I306454 I306454 4096 Jun 5 14:48 I306454
ls -lF /home/I306454/
# total 16
# drwxr-xr-x+ 2 I306454 I306454 4096 Aug 11 2017 Desktop/
# -rw-r--r-- 1 I306454 I306454 8980 Apr 20 2016 examples.desktop

The useradd Command Line Parameters

Parameter Description
-c comment Adds text to the new user’s comment field
-d home_dir Specifies a different name for the HOME directory other than the login name
-e expire_date Specifies a date, in YYYY-MM-DD format, when the account will expire
-f inactive_days Specifies the number of days after a password expires when the account will be disabled. A value of 0 disables the accunt as soon as the password expires; a value of -1 disables this feature
-g initial_group Specifies the group name or GID of the user’s login group
-G group Specifies one or more supplementary groups the user belongs to
-k Gopies the /etc/skel directory contents into the user’s HOME directory(must use -m as well)
-m Create the user’s HOME directory
-M Doesn’t create a user’s HOME directory(user if the default setting is to create one)
-n Create a new group using the same name as the user’s login name
-r Creates a system account
-p passwd Specifies a default password for the user account
-s shell Specifies the default login shell
-u Specifies a unique UID for the account

如果你有很多默认配置需要修改,那么你可以通过 useradd -D 修改这些默认配置

Removing a user

userdel I306454 默认情况下并不会删除对应用户的 HOME 目录。你需要添加 -r 参数达到这个效果

Modifying a user

Linux 提供了一些不同的工具包来修改用户信息

User account Modification Utilities

Command Description
usermod Edits user accout fields, as well as specifying primary and seconday group membership
passwd Changes the password for an existing user
chpassed Reads a file of login name and password pairs, and updates the passwords
chage Chagnes the password’s expiration date
chfn Changes the user account’s comment information
chsh Changes the user account’s default shell

感觉 usermod 已经具备了所有账户相关的基本操作了

后面有这些 cmd 的用法简介,但是我暂时用不到,先不摘录了

Using Linux Groups

Group 可以以群的单位管理权限,每个 group 都有特定的 GID

The /etc/group file

1
2
3
4
5
cat /etc/group
# ...
# jenkins:x:58116:
# mfe:x:58117:
# I306454:x:58118:
  • The gorup name
  • The group password
  • The GID
  • The list of user accounts that belong to the group

你可以通过 usermod 命令添加用户进组

Creating new groups

1
2
3
4
5
6
7
8
groupadd shared
tail /etc/group
# I306454:x:58118:
# shared:x:58119:

usermod -G shared I306454
tail /etc/group
# shared:x:58119:I306454

PS: 如果你改变用户组时,用户已经 login, 该用户需要重新 login 使之生效

PPS: 如果你用 -g 新组会代替旧组,如果 -G 则是多个组并存

Modifying groups

1
2
3
groupmod -n sharing shared
tail /etc/group
# sharing:x:58119:I306454

Decoding File Permissions

Using file permission symbols

1
2
3
4
ls -l 
# total 10192
# -rwxr--r-- 1 i306454 staff 159 May 30 15:56 badtest
# -rw-r--r-- 1 i306454 staff 24 Jun 4 13:57 file1

-rwxr--r-- 即为文件的权限信息

第一个字符表示文件类型

    • for files
  • d for directories
  • l for links
  • c for character devices
  • b for block devices
  • n for network devices

后面的字符都是权限

  • r for read permission
  • w for write permission
  • x for execute permission
    • denied

权限三个一组,分别代表 owner/group/everyone, owner 和 group 分别在 ls -l 后面有写出来

Default file permissions

umask 设置了所有文件和目录的默认权限

1
2
3
4
5
touch ttt
ls -l ttt
# -rw-r--r-- 1 i306454 staff 0 Jun 5 15:43 ttt
umask
# 0022

umask 结果的第一位表示 sticky bit, 后三位是权限的 octal mode 表示

Permissions Binary Octal Description
000 0 No permissions
–x 001 1 Execute-only permission
-w- 010 2 Write-only permission
-wx 011 3 Write and execute permissions
r– 100 4 Read-only permission
r-x 101 5 Read and execute permissions
rw- 110 6 Read and write permissions
rwx 111 7 Read, write and execute permissions

文件的 full 权限是 666,文件夹是 777. umask 可以理解为在这个 full 权限的基础上减去一个值。

之前我们 touch 的文件 rw-r--r-- 是 644 = 666 - 022

1
2
ls -ld newdir
# drwxr-xr-x 2 i306454 staff 64 Jun 5 15:56 newdir

drwxr-xr-x 755 = 777 - 022

Changing Security Settings

chmod options mode file

1
2
3
4
5
ls -l ttt
# -rw-r--r-- 1 i306454 staff 0 Jun 5 15:43 ttt
chmod 760 ttt
ls -l ttt
# -rwxrw---- 1 i306454 staff 0 Jun 5 15:43 ttt

除了数字表示,你也可以用字母表示

[ugoa…][+-=][rwxXstugo…]

  • u for the user

  • g for the group

  • o for others(everyone else)

  • a for all of the above

    • add perm
    • subtract perm
  • = set perm

  • X assigns execute permissions only if the object is a directory or if it already has execute permissions

  • s sets the UID or GID on execution

  • t saves program text

  • u sets the permissions to the owner’s permission

  • g sets the permissions to the group’s permission

  • o sets the permissions to the other’s permission

1
2
3
4
5
6
7
chmod o+r ttt
ls -l ttt
# -rwxrw-r-- 1 i306454 staff 0 Jun 5 15:43 ttt

chmod u-x ttt
ls -l ttt
# -rw-rw-r-- 1 i306454 staff 0 Jun 5 15:43 ttt

Changing ownership

改变文件 owner,比如离开组织的时候,做交接。使用 chown options owner[.group] file

1
2
3
4
5
6
# 只改 owner
chown dan newfile
# 同时改变 owner 和 group
chown dan.shared newfile
# 只改 group
chown .shared newfile

PS: 只有 root 可以改变文件的 owner, 任何 user 可以将文件组改变,重要这个user 是改变前后组的成员

Sharing Files

这个场景没用到过,以后再说

Chapter 8: Managing Filesystems

这章的内容我大致浏览了一下,作为了解即可。他介绍了很多系统类型,ext3 什么的以前见过,但是不明所以,刚好可以学习一下。

Exploring Linux Filesystems

filesystem: 用于存储文件,管理存储设备

Understanding the basic Linux filesystems

最原始的 Linux 文件系统间的的仿造了 Unix 文件系统,下面我们会介绍这个文件系统的发现过程

  • ext(extended filesystem) + inode(track info about files in directory)
  • ext 系统中文件最大只能 2G。 ext2 是 ext 的升级版,最大文件到 32G. 其他的特性就不举例了
  • Journaling filesystems, 貌似叫日志系统,算是文件更新到 inode 前的临时文件
  • ext3, 2001年加入 kernel
  • ext4, 2008年加入 kernel
  • Reiser filesystem, in 2001, Hans Reiser created the first journaling filesystem for Linux, call ReiserFS.
  • Journaled File System(JFS) 可能是最老的 journaling filesystem, IBM 1990 年开发

其他暂时不看了。。。

Chapter 9: Installing Software

Chapter 10: Working with Editors

第 9,10 章也没啥好看的,说的是软件安装和编辑器,跳过

Linux命令行与shell脚本编程大全 3rd 第二部分命令实验记录

Chapter 11: Basic Script building

Using Multiple Commands

使用分号分隔命令

1
2
3
4
5
date ; who
# Mon May 24 12:36:11 CST 2021
# i306454 console May 24 11:18
# i306454 ttys000 May 24 11:18
# i306454 ttys003 May 24 11:33

Creating a Script File

  1. 新建文件 mysh.sh
  2. 填入内容
  3. 设置环境
  4. 运行

However, the fi rst line of a shell script fi le is a special case, and the pound sign followed by the exclamation point tells the shell what shell to run the script under
Shell script 的第一行表示你想要用哪个 Shell 运行你的脚本

1
2
3
4
#!/bin/bash
# This scri displays the date and who's logged on
date
who

尝试运行 mysh.sh,运行失败 bash: mysh.sh: command not found

这是你有两种选择

  1. 将包含脚本的目录添加到 PATH 中,eg: export PATH=$PATH:path_to_folder
  2. 使用相对或绝对路径调用脚本, eg: ./mysh.sh

PS: 发现直接用 sh mysh.sh 即可,还省去了赋权的操作

这里直接使用相对路径调用 ./mysh.sh, 运行失败 bash: ./mysh.sh: Permission denied。通过 ls -l mysh.sh 查看权限, 发现并没有权限,然后赋权,再次尝试,运行成功。

1
2
3
4
5
6
7
8
ls -l mysh.sh 
# -rw-r--r-- 1 i306454 staff 70 May 24 12:40 mysh.sh

chmod u+x mysh.sh
./mysh.sh
# Mon May 24 13:11:37 CST 2021
# i306454 console May 24 11:18
# i306454 ttys000 May 24 11:18

Displaying Messages

使用 echo 打印信息

  • 当输出内容中没有什么特殊符号时,可以直接在 echo 后面接你要的内容
  • 当输出内容中包含单/双引号时,需要将 echo 的内容包含在 双/单引号中
  • 默认 echo 是会换行的,使用 echo -n xxx 取消换行

Using Variables

set: 打印当前环境的所有环境变量, 输出内容包括 PATH,HOME 等

1
2
3
4
5
6
set
# ANT_HOME=/Users/i306454/SAPDevelop/tools/apache-ant-1.8.4
# BASH=/usr/local/bin/bash
# BASHOPTS=checkwinsize:cmdhist:complete_fullquote:expand_aliases:extquote:force_fignore:globasciiranges:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
# BASH_ALIASES=()
# ...

在你的 sh 脚本中,可以调用这些变量

1
2
3
4
5
6
7
8
9
#!/bin/bash
echo "User info for userid $USER"
echo UID: $UID
echo HOME: $HOME

sh test_var.sh
# User info for userid i306454
# UID: 501
# HOME: /Users/i306454

当你在输出语句中想要打印 $ 这个特殊字符时,只需要在前面加斜杠

1
2
3
4
echo "This book cost $15"
# This book cost 5
echo "This book cost \$15"
# This book cost $15

PS: ${variable} 也是合法的,这种声明起到强调变量名的作用

变量名限制:最多 20 个字符,可以是英文,数组或者下划线,区分大小写。使用等号链接,中间不允许有空格

Command substitution

你可以通过一下方式将 cmd 输出的值赋给你的变量

  1. ` 反引号符(backtick character)
  2. $() 表达式
1
2
3
4
5
6
tdate=`date`
echo $tdate
# Mon May 24 15:46:51 CST 2021
tdate2=$(date)
echo $tdate2
# Mon May 24 15:47:39 CST 2021

这只是初级用法,更骚的操作是将输出定制之后用作后续命令的输出,比如下面这个例子,将文件夹下的所有目录写进 log 文件,并用时间戳做后缀 today=$(date +%y%m%d); ls -al /usr/bin > log.$today

Caution: Command substitution creates what’s called a subshell to run the enclosed command. A subshell is a separate child shell generated from the shell that’s running the script. Because of that, any variables you create in the script aren’t available to the subshell command.

Command substitution 这种使用方法会创建一个子 shell 计算变量值,子 shell 运行的时候是看不到你外面的 shell 中定义的变量的

Redirecting Input and Output

Output redirection

可以使用大于号(greater-than symbol)将输出导入文件

1
2
3
4
5
date > test6
ls -l test6
# -rw-r--r-- 1 i306454 staff 29 May 24 18:31 test6
cat test6
# Mon May 24 18:31:40 CST 2021

如果文件已经存在,则覆盖原有内容

1
2
3
4
5
who > test6
cat test6
# i306454 console May 24 11:18
# i306454 ttys000 May 24 11:18
# i306454 ttys003 May 24 18:00

使用两个大于号(double greater-than symbol)来做 append 操作

1
2
3
4
5
6
date >> test6
cat test6
# i306454 console May 24 11:18
# i306454 ttys000 May 24 11:18
# i306454 ttys003 May 24 18:00
# Mon May 24 18:35:12 CST 2021

Input redirection

使用小于号(less-htan symbol)将文件中的内容导入输入流 command < inputfile

1
2
3
wc < test6
# 4 21 125
# wc: show lines, words and bytes of input

除了从文件导入,从命令行直接导入多行也是可行的,术语叫做 inline input redirection。这个之前在之前阿里云 setup 环境的时候操作过的。使用两个小于号(<<) + 起止描述符实现

1
2
3
4
5
6
wc << EOF
> test String 1
> test String 2
> test String 3
> EOF
# 3 9 42

Pipes

使用方式 command1 | command2

Don’t think of piping as running two commands back to back. The Linux system actually runs both commands at the same time, linking them together internally in the system. As the fi rst command produces output, it’s sent immediately to the second command. No inter-mediate fi les or buffer areas are used to transfer the data.
pipe 中的命令是同时执行的,变量传递不涉及到中间变量

1
2
3
4
5
6
7
8
ls 
# input.txt mysh.sh output.txt test_echo.sh
# log.210524 out.txt test6 test_var.sh
ls | sort
# input.txt
# log.210524
# mysh.sh
# ...

pipe 可以无限级联 cmd1 | cmd2 | cmd3 | ...

Performing Math

Bash 中提供了两种方式进行计算

The expr command

1
2
expr 1 + 5
# 6

expr 支持常规算数运行 + 与或非 + 正则 + 字符串操作等

这个 expr 表达式有点尴尬,他的一些常规运算是要加反斜杠的,简直无情

1
2
3
4
expr 1 * 6
# expr: syntax error
expr 1 \* 6
# 6

这个还算好的,如果在 sh 文件中调用,表达式就更操蛋了

1
2
3
4
5
6
7
8
#!/usr/local/bin/bash
var1=10
var2=20
var3=$(expr $var2 / $var1)
echo The result is $var3

sh test_expr.sh
# The result is 2

Using brackets

Bash 中保留 expr 是为了兼容 Bourne shell,同时它提供了一种更简便的计算方式,$[ operation ]

1
2
3
4
5
6
var1=$[1 + 5]
echo $var1
# 6
var2=$[ $var1 * 2 ]
echo $var2
# 12

方括号表达式可以自动完成计算符号的识别,不需要用反斜杠做转义符,唯一的缺陷是,他只能做整数运行

1
2
3
4
5
var1=100
var2=45
var3=$[$var1/$var2]
echo $var3
# 2

A floating-point solution

bc: bash calculation, 他可以识别一下内容

  • Number(integer + floating point)
  • Variables(simple variables + arrays)
  • Comments(# or /* … */)
  • Expressions
  • Programming statements(such as if-then statements)
  • Functions
1
2
3
4
5
6
7
8
9
10
bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
12 * 5.4
64.8
3.156 * (3 + 5)
25.248
quit

可以通过 scale 关键字指定精确位, 默认 scale 为 0, -q 用于跳过命令文字说明

1
2
3
4
5
6
7
bc -q 
3.44 / 5
0
scale=4
3.44 / 5
.6880
quit

如前述,bc 可以识别变量

1
2
3
4
5
6
7
8
bc -q
var1=10
var1 * 4
40
var2 = var1 / 5
print var2
2
quit

你可以通过 Command substitution,在 sh 脚本中调用 bc, 形式为 variable=$(echo "options; expression" | bc)

1
2
3
4
5
6
7
cat test9
# #!/usr/local/bin/bash
# var1=$(echo "scale=4; 3.44/5" | bc)
# echo This answer is $var1

sh test9
# This answer is .6880

脚本中定义的变量也可以使用

1
2
3
4
5
6
7
8
cat test10
# #!/usr/local/bin/bash
# var1=100
# var2=45
# var3=$(echo "scale=4; $var1 / $var2" | bc)
# echo The answer for this is $var3bash-5.1$
sh test10
# The answer for this is 2.2222

当遇到很长的计算表达式时,可以用 << 将他们串起来

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat test12
# #!/usr/local/bin/bash
# var1=10.46
# var2=43.67
# var3=33.2
# var4=71

# var5=$(bc << EOF
# scale = 4
# a1 = ($var1 * $var2)
# b1 = ($var3 * $var4)
# a1 + b1
# EOF
# )

# echo The final answer for this mess is $var5
sh test12
# The final answer for this mess is 2813.9882

PS: 在上面的脚本中我们用了 Command substitution 所以中间的变量前面都是要加 $ 的,当终端调用 bc 时就不需要了

Exiting the Script

There’s a more elegant way of completing things available to us. 每个命令结束时,系统都会分配一个 0-255 之间的整数给他

Checking the exit status

Linux 使用 $? 表示上一条命令的执行状态

1
2
3
4
date
# Mon May 24 19:47:29 CST 2021
echo $?
# 0

运行正常,返回 0。如果有问题,则返回一个非零

1
2
3
4
asd
# bash: asd: command not found
echo $?
# 127

常见错误码表

Code Desc
0 Success
1 General unknown error
2 Misuse of shell command
126 The cmd can’t execute
127 Cmd not found
128 Invalid exit argument
128+x Fatal err with Linux singal x
130 Cmd terminated with Ctrl+C
255 Exit status out of range

The exit command

exit 关键字让你可以定制脚本的返回值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat test13
# #!/usr/local/bin/bash
# # testing the exit status
# var1=10
# var2=20
# var3=$[$var1 + $var2]
# echo The answer is $var3
# exit 5

chmod u+x test13
./test13
# The answer is 30
echo $?
# 5

PS: 这里看出来差别了,如果我直接用 sh test13 执行的话,结果是 0

这里需要指出的是,exit code 最大为 255 如果超出了,系统会自己做修正

1
2
3
4
5
6
7
8
9
10
cat test14
# #!/usr/local/bin/bash
# # exit code more than 255
# var=300
# exit $var

chmod u+x test14
./test14
echo $?
# 44

Chapter 12: Using Structured Commands

本章内容主要包括 loigc flow control 部分

Working with the if-then Statement

if-then 是最基本的控制方式,format 如下, 判断依据是 command 的 exit code, 如果是 0 则表示 success,其他的则为 fail.

1
2
3
4
if command
then
command
fi

positive sample 如下

1
2
3
4
5
6
7
8
9
10
11
cat test1.sh 
# #!/usr/local/bin/bash
# # testing the if statement
# if pwd
# then
# echo "It worked"
# fi
chmod u+x test1.sh
./test1.sh
# /Users/i306454/gitStore/hexo
# It worked

negative sample 如下

1
2
3
4
5
6
7
8
9
10
11
12
cat test2.sh 
# # #!/usr/local/bin/bash
# # # testing a bad command
# if IamNotACommand
# then
# echo "It worked"
# fi
# echo "We are outside the if statement"
chmod u+x test2.sh
./test2.sh
# ./test2.sh: line 3: IamNotACommand: command not found
# We are outside the if statement

PS: if-then 可以改一下 format,将 if-then 写在一行,看上去更贴近其他语言的表现形式

1
2
3
if command; then
commands
fi

then 中可以写代码段,如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat test3.sh
# #!/usr/local/bin/bash
# # testing multiple commands in the then section
# #
# testuser=MySQL
# #
# if grep $testuser /etc/passwd
# then
# echo "This is my first command"
# echo "This is my second command"
# echo "I can even put in other commands beside echo:"
# ls -a /home/$testuser/.b*
# fi
./test3.sh
# _mysql:*:74:74:MySQL Server:/var/empty:/usr/bin/false
# This is my first command
# This is my second command
# I can even put in other commands beside echo:
# ls: /home/MySQL/.b*: No such file or directory

PS:由于是 mac 系统,有点出入,但是目的还是达到了。顺便测了一下缩进,将 echo 的缩进去了一样 work

Exploring the if-then-else Statement

格式如下,

1
2
3
4
5
6
if command
then
commands
else
commands
fi

改进 test3.sh 如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# #!/usr/local/bin/bash
# # testing multiple commands in the then section
# #
# testuser=NoSuchUser
# #
# if grep $testuser /etc/passwd
# then
# echo "This is my first command"
# echo "This is my second command"
# echo "I can even put in other commands beside echo:"
# ls -a /home/$testuser/.b*
# else
# echo "The user $testuser does not exist on this system."
# echo
# fi
./test4.sh
# The user NoSuchUser does not exist on this system.

Nesting if

1
2
3
4
5
6
7
if command1
then
commands
elif command2
then
more commands
fi

写脚本检测帐户是否存在,然后检测用户文件夹是否存在

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat test5.sh
# #!/usr/local/bin/bash
# # testing nested ifs - use elif
# #
# testuser=NoSuchUser
# #
# if grep $testuser /etc/passwd
# then
# echo "The user $testuser exists on this system"
# elif ls -d /home/$testuser
# then
# echo "The user $testuser does not exist on this system"
# echo "However, $testuset has directory."
# fi
chmod u+x test5.sh
./test5.sh
# ls: /home/NoSuchUser: No such file or directory

Tips: Keep in mind that, with an elif statement, any else statements immediately following it are for that elif code block. They are not part of a preceding if-then statement code block.(elif 之后紧跟的 else 是一对的,它不属于前面的 if-then)

多个 elif 串连的形式

1
2
3
4
5
6
7
8
9
10
11
12
13
if command1
then
command set 1
elif command2
then
command set 2
elif command3
then
command set 3
elif command4
then
command set 4
fi

Trying the test Command

if-then 条件判断只支持 exit code,为了使它更通用,Linux 提供了 test 工具集,如果 test 判定结果为 TRUE 则返回 0 否则非 0,格式为 test condition, 将它和 if-then 结合,格式如下

1
2
3
4
if test condition
then
commands
fi

如果没有写 condition,则默认为非 0 返回值

1
2
3
4
5
6
7
8
9
10
11
12
cat test6.sh 
# #!/usr/local/bin/bash
# # Testing the test command
# #
# if test
# then
# echo "No expression return a True"
# else
# echo "No expression return a False"
# fi
./test6.sh
# No expression return a False

将测试条件替换为变量,输出 True 的语句

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat test6.sh 
# #!/usr/local/bin/bash
# # Testing the test command
# my_variable="Full"
# #
# if my_variable
# then
# echo "The $my_variable expression return a True"
# else
# echo "The $my_variable expression return a False"
# fi
./test6.sh
# The Full expression return a False

# replace my_variable=""
./test6.sh
# The expression return a False

测试条件还可以简写成如下形式

Be careful; you must have a space after the first bracket and a space before the last bracket, or you’ll get an error message.

1
2
3
4
if [ condition ]
then
commands
fi

test condition 可以测试如下三种情景

  • Numeric comparisons
  • String comparisons
  • File comparisions

Using numeric comparisons

Comparison Description
n1 -eq n2 Check if n1 is equal to n2
n1 -ge n2 Check if n1 greater than or equal to n2
n1 -gt n2 Check if n1 is greater than n2
n1 -le n2 Check if n1 is less than or equal to n2
n1 -lt n2 Check if n1 less than n2
n1 -ne n2 Check if n1 is not equal to n2

test condition 对变量也是有效的,示例如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat numeric_test.sh 
# #!/usr/local/bin/bash
# # Using numeric test eveluations
# value1=10
# value2=11
# #
# if [ $value1 -gt 5 ]
# then
# echo "The test value $value1 is greater than 5"
# fi
# #
# if [ $value1 -eq $value2 ]
# then echo "The values are equal"
# else
# echo "The values are different"
# fi
./numeric_test.sh
# The test value 10 is greater than 5
# The values are different

但是 test condition 有个缺陷,它不能测试浮点型数据

1
2
3
4
value1=5.55
[ $value1 -gt 5 ]; echo $?
# bash: [: 5.55: integer expression expected
# 2

Caution: Remember that the only numbers the bash shell can handle are integers.

Using string comparisons

Comparison Description
str1 = str2 Check if str1 is the same as string str2
str1 != str2 Check if str1 is not the same as string str2
str1 < str2 Check if str1 is less than str2
str1 > str2 Check if str1 is greater than string str2
-n str1 Check if str1 has a length greater than zero
-z str1 Check if str1 has a length of zero
1
2
3
4
5
6
7
8
9
10
11
12
13
cat test8.sh 
# #!/usr/local/bin/bash
# # Testing string equality
# testuser=baduser
# #
# if [ $USER != $testuser ]
# then
# echo "This is $testuser"
# else
# echo "Welcome $testuser"
# fi
./test8.sh
# This is baduser

在处理 <> 时,shell 会有一些很奇怪的注意点

  • <> 必须使用转义符号,不然系统会把他们当作流操作
  • <> 的用法和 sort 中的用法时不一致的

针对第一点的测试,测试中,比较符号被当成了流操作符,新生产了一个 hockey 文件。这个操作 exit code 是 0 所以执行了 true 的 loop. 你需要将条件语句改为 if [ $val1 \> $val2 ] 才能生效

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat badtest.sh 
# #!/usr/local/bin/bash
# # mis-using string comparisons
# val1=baseball
# val2=hockey
# #
# if [ $val1 > $val2 ]
# then
# echo "$val1 is greater than $val2"
# else
# echo "$val1 is less than $val2"
# fi
/badtest.sh
# baseball is greater than hockey
ls
# badtest.sh hockey

针对第二点,sort 和 test 对 string 的比较是相反的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
cat test9b.sh 
# #!/usr/local/bin/bash
# # testing string sort order
# val1=Testing
# val2=testing
# #
# if [ $val1 \> $val2 ]
# then
# echo "$val1 is greater than $val2"
# else
# echo "$val1 is less than $val2"
# fi
./test9b.sh
Testing is less than testing

sort << EOF
> Testing
> testing
> EOF
Testing
testing

PS: 这里和书本上有出入,我在 MacOS 里测试两者是一致的,大写要小于小些,可能 Ubantu 上不一样把,有机会可以测一测

Note: The test command and test expressions use the standard mathematical comparison symbols for string compari-sons and text codes for numerical comparisons. This is a subtle feature that many programmers manage to get reversed. If you use the mathematical comparison symbols for numeric values, the shell interprets them as string values and may not produce the correct results.

test condition 的处理模式是 数字 + test codes(-eq); string + 运算符(</>/=),刚好是交叉的,便于记忆

-n-z 进行测试,undefined 的变量默认长度为 0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
cat test10.sh 
# #!/usr/local/bin/bash
# # testing string length
# val1=testing
# val2=''
# #
# if [ -n $val1 ]
# then
# echo "The string '$val1' is not empty"
# else
# echo "The string '$val1' is empty"
# fi
# #
# if [ -z $val2 ]
# then
# echo "The string '$val2' is empty"
# else
# echo "The string '$val2' is not empty"
# fi
# #
# if [ -z $val3 ]
# then
# echo "The string '$val3' is empty"
# else
# echo "The string '$val3' is not empty"
# fi
./test10.sh
# The string 'testing' is not empty
# The string '' is empty
# The string '' is empty

Using file comparisons

Comparison Description
-d file Check if file exists and is a directory
-e file Check if file or directory exists
-f file Check if file exists and is a file
-r file Check if file exists and is readable
-s file Check if file exists and is not empty
-w file Check if file exists and is writable
-x file Check if file exists and is executable
-O file Check if file exists and is owned by the current user
-G file Check if file exists and the default group is the same as the current user
file1 -nt file2 Check if file1 is newer than file2
file1 -ot file2 Check if file1 is older than file2

测试范例如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
mkdir test_folder
[ -d test_folder ] && echo true || echo false
# true
[ -e test_folder ] && echo true || echo false
# true
[ -e xxx ] && echo true || echo false
# false

chmod u+xrw test_file.sh
ls -l test_file.sh
# -rwxr--r-- 1 i306454 staff 190 May 25 17:51 test_file.sh
[ -r test_file.sh ] && echo true || echo false
# true
[ -w test_file.sh ] && echo true || echo false
# true
[ -x test_file.sh ] && echo true || echo false
# true

chmod u-rxw test_file.sh
ls -l test_file.sh
# ----r--r-- 1 i306454 staff 190 May 25 17:51 test_file.sh
[ -r test_file.sh ] && echo true || echo false
# false
[ -w test_file.sh ] && echo true || echo false
# false
[ -x test_file.sh ] && echo true || echo false
# false

touch tmp_file
[ -s tmp_file ] && echo true || echo false
# false
echo new line >> tmp_file
[ -s tmp_file ] && echo true || echo false
# true

ls -l
# total 8
# -rw-r--r-- 1 i306454 staff 9 May 25 17:59 tmp_file
# -rw-r--r-- 1 i306454 staff 0 May 25 18:01 tmp_file2
[ tmp_file2 -nt tmp_file ] && echo true || echo false
# true
[ tmp_file -nt tmp_file2 ] && echo true || echo false
# false
[ tmp_file -ot tmp_file2 ] && echo true || echo false
# true
[ tmp_file2 -ot tmp_file ] && echo true || echo false
# false

Considering Compound Testing

组合条件

  • [ condition1 ] && [ condition2 ]
  • [ condition1 ] || [ condition2 ]
1
2
[ -f tmp_file ] && [ -d $HOME ] && echo true || echo false
# true

Working with Advanced if-then Features

if-then 的增强模式

  • Double parentheses for mathematical expressions(双括号)
  • Double square brackets for advanced string handling functions(双方括号)

Using double parentheses

创括号是针对算数运算的

test command 只提供了简单的算术运算,双括号提供的算力更强,效果和其他语言类似,格式 (( expression )). 除了 test 支持的运算,它还支持如下运算

Comparison Description
val++ Post-incremnet
val– Post-decrement
++val Pre-increment
–val Pre-decrement
! Logical negation
~ Bitwise negation
** Exponentiation
<< Left bitwise shift
>> Right bitwise shift
& Bitwise Boolean AND
| Bitwise Boolean OR
&& Logical AND
|| Logical OR

测试范例如下

1
2
3
# ** 次方操作
(( $val1**2 > 90 )) && echo true || echo false
true

Using double bracket

双方括号是针对字符运算的,格式为 [[ expression ]]. 除了 test 相同的计算外,他还额外提供了正则的支持

Note: bash 是支持双方括号的,但是其他 shell 就不一定了

1
2
[[ $USER == i* ]] && echo true || echo false
# true

PS: 双等(==)表示 string 符合 pattern,直接用等号也是可以的

Considering the case Command

对应 Java 中的 switch-case 语法, 格式如下. 当内容和 pattern 匹配时,就会执行对应的语句

1
2
3
4
5
case variable in
pattern1 | pattern2) command1;;
pattern3) command2;;
*) default commands;;
esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat test26.sh 
# #!/usr/local/bin/bash
# # using the case command
# case $USER in
# rich | barbara)
# echo "Welcome $USER"
# echo "Please enjoy your visit";;
# i306454)
# echo "Special testing account";;
# jessica)
# echo "Do not forget to log off when you're done";;
# *)
# echo "Sorry, you are not allowed here";;
# esac
./test26.sh
# Special testing account

Issues

Issue1: 写脚本的时候,发现一个很奇怪的问题

1
2
3
4
5
6
7
8
9
10
# 未赋值的变脸 -n 会返回 true ?!
[ -n $ret23 ] && echo true || echo false
# true
# 经多方查证,需要加引号
[ -n "$ret23" ] && echo true || echo false
# false

# 这里就体现出增强型的好处了
[[ -n $ret23 ]] && echo true || echo false
# false

以后可以的话都用增强型把,容错率更高

More Structured Commands

这章介绍了其他一些流程控制的关键词

The for Command

1
2
3
4
for var in list
do
commands
done

PS: for 和 do 也可以写一起(for var in list; do),和 if-then 那样

Reading values in a list

1
2
3
4
5
6
7
8
9
10
11
12
13
cat test1.sh 
# #!/usr/local/bin/bash
# # basic for command
# for test in a b c d e
# do
# echo The character is $test
# done
./test1.sh
# The character is a
# The character is b
# The character is c
# The character is d
# The character is e

当 for 结束后变量还会存在

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cat test1b.sh 
# #!/usr/local/bin/bash
# # Testing the for variable after the looping
# for test in a b c d e
# do
# echo The character is $test
# done

# echo The last character is $test

# test=Connecticut
# echo "Wait, now we're visiting $test"
./test1b.sh
# The character is a
# The character is b
# The character is c
# The character is d
# The character is e
# The last character is e
# Wait, now we're visiting Connecticut

Reading complex values in a list

当 list 中包含一些标点时,结果可能就不是预期的那样了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat badtest1.sh 
# #!/usr/local/bin/bash
# # another example of how not to use the for command
# for test in I don't know if this'll work, append some thing more?
# do
# echo The character is $test
# done
./badtest1.sh
# The character is I
# The character is dont know if thisll
# The character is work,
# The character is append
# The character is some
# The character is thing
# The character is more?

解决方案:

  1. 给引号加转义符(for test in I don't know if this'll work, append some thing more?)
  2. 将 string 用双引号包裹(for test in I don”‘“t know if this”‘“ll work, append some thing more?)

for 默认使用空格做分割,如果想要连词,你需要将对一个的词用双引号包裹起来

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
cat badtest2.sh
# #!/usr/local/bin/bash
# # another example of how not to use the for command

# for test in Nevada New Hampshire New Mexico New York North Carolina
# do
# echo "Now going to $test"
# done
./badtest2.sh
# Now going to Nevada
# Now going to New
# Now going to Hampshire
# Now going to New
# Now going to Mexico
# Now going to New
# Now going to York
# Now going to North
# Now going to Carolina

# update to: for test in Nevada "New Hampshire" "New Mexico" "New York"
./badtest2.sh
# Now going to Nevada
# Now going to New Hampshire
# Now going to New Mexico
# Now going to New York

Reading a list from a variable

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat test4.sh
# #!/usr/local/bin/bash
# # using a variable to hold the list
# list="Alabama Alaska Arizona Arkansas Colorado"
# list=$list" Connecticut"
# for state in $list
# do
# echo "Have you ever visited $state?"
# done
./test4.sh
# Have you ever visited Alabama?
# Have you ever visited Alaska?
# Have you ever visited Arizona?
# Have you ever visited Arkansas?
# Have you ever visited Colorado?
# Have you ever visited Connecticut?

PS: list=$list" Connecticut" 是 shell 中 append 字符串的常见操作

Reading values from a command

结合其他命令,计算出 list 的值

1
2
3
4
5
6
7
8
9
10
11
12
13
echo a b c > states
cat test5.sh
# #!/usr/local/bin/bash
# # reading values from a file
# file=states
# for state in $(cat $file)
# do
# echo "Visit beautiful $state"
# done
./test5.sh
# Visit beautiful a
# Visit beautiful b
# Visit beautiful c

Changing the field separator

有一个特殊的环境变量叫做 IFS(internal field separator). 他可以作为分割 field 的依据。默认的分割符有

  • A space
  • A tab
  • A newline

如果你想要将换行作为分割符,你可以使用 IFS=$'\n'

Caution: 定制 IFS 之后一定要还原

测试环节:如何打印当前 IFS 的值?

1
2
3
4
5
echo -n "$IFS" | hexdump
# 0000000 20 09 0a
# 0000003
printf %q "$IFS"
# ' \t\n'

Caution: 变量和引号之间的关系:

  • 单引号,所见即所得。写什么即是什么
  • 双引号,中间的变量会做计算
  • 没符号,用于连续的内容,如果内容中带空格,需要加双引号

脚本中 IFS.OLD=$IFS 的赋值语句经常会抛异常 ./test_csv.sh: line 3: IFS.OLD=: command not found 但是写成 IFS_OLD 的话就可以运行,可能是点号的形式会变成其他一些什么调用也说不定。以后为了稳定,还是用下划线的形式把

1
2
3
4
IFS_OLD=$IFS
IFS=$'\n'
<use the new IFS value in code>
IFS=$IFS_OLD

其他定制分割符 IFS=:, 或者多分割符 IFS=$'\n':;"

Reading a directory using wildcards

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
cat test6.sh
# #!/usr/local/bin/bash
# # iterate through all the files in a directory
# for file in /Users/i306454/tmp/*
# do
# if [ -d "$file" ]
# then
# echo "$file is a directory"
# elif [ -f "$file" ]
# then
# echo "$file is a file"
# fi
# done
./test6.sh
# /Users/i306454/tmp/backup is a directory
# /Users/i306454/tmp/bash_test is a directory
# /Users/i306454/tmp/csv is a directory
# /Users/i306454/tmp/dfile is a directory
# /Users/i306454/tmp/ifenduser.sh is a file
# /Users/i306454/tmp/plantuml is a directory
# /Users/i306454/tmp/sh is a directory

PS: 在这个例子中有一个很有意思的点,在 test 中,将变量 file 使用双引号包裹起来了。这是因为 Linux 中带空格的文件或文件夹是合法的,如果没有引号,解析就会出错

Caution: It’s always a good idea to test each file or directory before trying to process it.

The C-Style for command

C 语言中 for 循环如下

The C language for command

1
2
3
4
for (i=0; i<10; i++)
{
printf("The next number is %d\n", i);
}

bash 中也提供了类似的功能, 语法为 for (( variable assignment; condition; iteration process )) 例子:for(( a=1; a<10; a++ ))

限制:

  • The assignment of the variable value can contain space
  • The variable in the condition isn’t preceded with a dollar sign
  • The equation for the iteration process doesn’t use the expr command format

这种用法,对我倒是很亲切,但是和之前用过的那些变量赋值之类的语句确实有一些语法差异的。这个语句中各种缩进,空格都不作限制

1
2
3
4
5
6
7
8
9
10
11
cat test8.sh 
# #!/usr/local/bin/bash
# # Testing the C-style for loop
# for ((i=1; i<= 3; i++))
# do
# echo "The next number is $i"
# done
./test8.sh
# The next number is 1
# The next number is 2
# The next number is 3

Using multiple variables

for 中包含多个参数

1
2
3
4
5
6
7
8
9
10
11
cat test9.sh 
# #!/usr/local/bin/bash
# # Testing the C-style for loop
# for ((a=1, b=10; a<= 3; a++, b--))
# do
# echo $a - $b
# done
./test9.sh
# 1 - 10
# 2 - 9
# 3 - 8

The while Command

1
2
3
4
while test command
do
other commands
done

示例

1
2
3
4
5
6
7
8
9
10
11
12
13
cat test10.sh 
# #!/usr/local/bin/bash
# # while command test
# var1=3
# while [ $var1 -gt 0 ]
# do
# echo $var1
# var1=$[ $var1 - 1 ]
# done
./test10.sh
# 3
# 2
# 1

Using multiple test commands

while 判断的时候可以接多个条件,但是只有最后一个条件的 exit code 起决定作用。就算第一个条件我直接改为 cmd 每次都抛错,循环照常进行

还有,每个条件要新起一行, 当然用分号隔开也是可以的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat test11.sh 
# #!/usr/local/bin/bash
# # Testing a multicommand while loop
# var1=3
# while echo $var1
# [ $var1 -gt 0 ]
# do
# echo "This is inside the loop"
# var1=$[ $var1 - 1 ]
# done
./test11.sh
# 3
# This is inside the loop
# 2
# This is inside the loop
# 1
# This is inside the loop
# 0

The until Command

语意上和 while 相反,但是用法一致

1
2
3
4
until test commands
do
other commands
done
1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat test12 
# #!/usr/local/bin/bash
# # using the until command
# var1=100
# until [ $var1 -eq 0 ]
# do
# echo $var1
# var1=$[ $var1 - 25 ]
# done
./test12
# 100
# 75
# 50
# 25

Nesting Loops

循环嵌套,很常见

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cat test14             
# #!/usr/local/bin/bash
# # nesting for loops
# for (( a=1; a<=3; a++))
# do
# echo "Starting loop $a:"
# for (( b=1; b<=3; b++ ))
# do
# echo " Inside loop: $b"
# done
# done
./test14
# Starting loop 1:
# Inside loop: 1
# Inside loop: 2
# Inside loop: 3
# Starting loop 2:
# Inside loop: 1
# Inside loop: 2
# Inside loop: 3
# Starting loop 3:
# Inside loop: 1
# Inside loop: 2
# Inside loop: 3

while + for 的例子。树上的例子 for 中 边界条件是 $var2<3; 和之前表现的语法不一样,试了一下,有无 $ 都是可以的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cat test14
# #!/usr/local/bin/bash
# # nesting for loops
# var1=3
# while [ $var1 -ge 0 ]
# do
# echo "Outer loop: $var1"
# for (( var2=1; var2<3; var2++))
# do
# var3=$[ $var1*$var2 ]
# echo " Inner loop: $var1 * $var2 = $var3"
# done
# var1=$[ $var1 - 1 ]
# done
./test14
# Outer loop: 3
# Inner loop: 3 * 1 = 3
# Inner loop: 3 * 2 = 6
# Outer loop: 2
# Inner loop: 2 * 1 = 2
# Inner loop: 2 * 2 = 4
# Outer loop: 1
# Inner loop: 1 * 1 = 1
# Inner loop: 1 * 2 = 2
# Outer loop: 0
# Inner loop: 0 * 1 = 0
# Inner loop: 0 * 2 = 0

Looping on File Data

Stackoverflow 上看到一篇解释 IFS=$’\n’ 的帖子,挺好。一句话就是 $'...' 的语法可以表示转义符

通常来说,你会需要遍历文件中的内容,这需要两个知识点

  1. Using nested loops
  2. Changing the IFS environment variable

通过设置 IFS 你可以在包含空格的情况下处理一行内容. 下面是处理 /etc/passwd 文件是案例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
cat test1
# #!/usr/local/bin/bash
# # changing the IFS value
# IFS.OLD=$IFS
# IFS=$'\n'
# for entry in $(cat /etc/passwd)
# do
# echo "Values in $entry -"
# IFS=:
# for value in $entry
# do
# echo " $value"
# done
# done
./test1
# Values in _oahd:*:441:441:OAH Daemon:/var/empty:/usr/bin/false
# _oahd
# badtest1.sh
# badtest2.sh
# test1
# test12
# test14
# test15
# 441
# 441
# OAH Daemon
# /var/empty
# /usr/bin/false
# ...

Controlling the loop

通过 break, continue 控制流程

The break command

打断单层循环, 这个语法适用于任何循环语句,比如 for, while, until 等

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat test17
# #!/usr/local/bin/bash
# # Breaking out of a for loop
# for var1 in 1 2 3 4 5
# do
# if [ $var1 -eq 3 ]
# then
# break
# fi
# echo "Iteration number: $var1"
# done
# echo "The for loop is completed"
./test17
# Iteration number: 1
# Iteration number: 2
# The for loop is completed

打断内层循环

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat test19
# #!/usr/local/bin/bash
# # Breaking out of an inner loop
# for (( a=1; a<4; a++ ))
# do
# echo "Outer loop: $a"
# for (( b=1; b<4; b++ ))
# do
# if [ $b -eq 2 ]
# then
# break
# fi
# echo " Inner loop: $b"
# done
# done
./test19
# Outer loop: 1
# Inner loop: 1
# Outer loop: 2
# Inner loop: 1
# Outer loop: 3
# Inner loop: 1

在内部循环执行过程中,打断外层循环,这个特性倒是很新颖,Java 中没见过 Haha

break n 默认是 1,打断当前的循环,设置成 2 就是打断外面一层直接退出。下面例子中,我们通过在 inner for 中 break 2 直接退出了外层 for 循环

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat test20 
# #!/usr/local/bin/bash
# # Breaking out of an outer loop
# for (( a=1; a<4; a++ ))
# do
# echo "Outer loop: $a"
# for (( b=1; b<4; b++ ))
# do
# if [ $b -gt 2 ]
# then
# break 2
# fi
# echo " Inner loop: $b"
# done
# done
./test20
# Outer loop: 1
# Inner loop: 1
# Inner loop: 2

The continue command

提前结束循环,继续下一次循环. 下面例子中,当当前变量 3 < x < 8 时跳过打印. 前面介绍的循环体都适用,如 for, while 和 until

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat test21 
# #!/usr/local/bin/bash
# # Using the continue command
# for (( var1=1; var1<10; var1++ ))
# do
# if [ $var1 -gt 3 ] && [ $var1 -lt 8 ]
# then
# continue
# fi
# echo "Iteration number: $var1"
# done
./test21
# Iteration number: 1
# Iteration number: 2
# Iteration number: 3
# Iteration number: 8
# Iteration number: 9

和 break 一样,continue 也支持 continue n 来跳过循环。测试用例中,当外层变量值 2 < x < 4 时,跳过打印

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
cat test22
# #!/usr/local/bin/bash
# # Continuing an outer loop
# for (( a=1; a<=5; a++ ))
# do
# echo "Iteration $a:"
# for (( b=1; b<3; b++))
# do
# if [ $a -gt 2 ] && [ $a -lt 4 ]
# then
# continue 2
# fi
# var3=$[ $a * $b ]
# echo " The result of $a * $b is $var3"
# done
# done
./test22
# Iteration 1:
# The result of 1 * 1 is 1
# The result of 1 * 2 is 2
# Iteration 2:
# The result of 2 * 1 is 2
# The result of 2 * 2 is 4
# Iteration 3:
# Iteration 4:
# The result of 4 * 1 is 4
# The result of 4 * 2 is 8
# Iteration 5:
# The result of 5 * 1 is 5
# The result of 5 * 2 is 10

Processing the Output of a Loop

for 中打印的语句可以在 done 后面接文件操作符一起导入,还有这种功能。。。那我之前写脚本用的 printf 不是显得有点呆

1
2
3
4
5
6
7
8
9
10
11
12
13
cat test23
# #!/usr/local/bin/bash
# # redirecting the for output to a file
# for (( a=1; a<=5; a++ ))
# do
# echo "The number is $a"
# done > test23.txt
cat test23.txt
# The number is 1
# The number is 2
# The number is 3
# The number is 4
# The number is 5

PS: 试了一下,echo -n 也是 OK 的

同理,done 后面还可以接其他的命令, 这个扩展很赞

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat test24
# #!/usr/local/bin/bash
# # piping a loop to another command
# for state in "North Dakota" Connecticut Illinois Alabama Tennessee
# do
# echo "$state is the next place to go"
# done | sort
# echo "This completes our travels"
./test24
# Alabama is the next place to go
# Connecticut is the next place to go
# Illinois is the next place to go
# North Dakota is the next place to go
# Tennessee is the next place to go
# This completes our travels

Practical Examples

一些实用的脚本范例

Finding executable files

通过遍历 PATH 中的路径,统计处你可以运行的 commands 列表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cat test25
# #!/usr/local/bin/bash
# # Finding file in the PATH
# IFS.OLD=$IFS

# IFS=:
# for folder in $PATH
# do
# echo "$folder:"
# for file in $folder/*
# do
# if [ -x $file ]
# then
# echo " $file"
# fi
# done
# done

# IFS=$IFS.OLD
./test25 | more
# /Users/i306454/.jenv/bin:
# /usr/local/sbin:
# /usr/local/sbin/unbound
# ....

PS: 这个 more 就用的很灵性!!

Creating multiple user accounts

将需要创建的新用户写到文件中,并通过脚本解析文件,批量创建

1
2
3
4
5
6
7
8
9
#!/bin/bash
# Process new user accounts

input="users.csv"
while IFS=',' read -r userid name
do
echo "adding $userid"
useradd -c "$name" -m $userid
done < "$input"

Chapter 14: Handling User Input

这章主要讲如何在脚本中做交互

Passing Parameters

Reading parameters

bash 会将传入的所有变量都赋给 positional parameters. 这些位置变量以 $ 开头,$0 为脚本名称,$1 为第一个参数,以此类推

根据传入参数计算斐波那契额终值

1
2
3
4
5
6
7
8
9
10
11
cat test1
# #!/usr/local/bin/bash
# # Using one command line parameter
# factorial=1
# for (( number=1; number<=$1; number++ ))
# do
# factorial=$[ $factorial * $number ]
# done
# echo The factorial of $1 is $factorial
./test1 5
# The factorial of 5 is 120

多参数调用案例

1
2
3
4
5
6
7
8
9
10
11
12
cat test2
# #!/usr/local/bin/bash
# # Testing two command line parameters
# total=$[ $1 * $2 ]
# echo The first parameter is $1
# echo The first parameter is $2
# echo The total value is $total

./test2 2 5
# The first parameter is 2
# The first parameter is 5
# The total value is 10

字符串作为参数

1
2
3
4
5
6
7
cat test3
# #!/usr/local/bin/bash
# # Testing string parameters
# echo Hello $1, glad to meet you

bash-5.1$ ./test3 jack
# Hello jack, glad to meet you

如果字符串之间有空格,需要用引号包裹起来

当参数数量超过 9 个的时候,你需要用花括号来调用

1
2
3
4
5
6
7
8
9
10
11
at ./test4
# #!/usr/local/bin/bash
# # handling lots of parameters
# total=$[ ${10} * ${11} ]
# echo The tenth parameter is ${10}
# echo The eleventh parameter is ${11}
# echo The total is $total
./test4 1 2 3 4 5 6 7 8 9 10 11 12
# The tenth parameter is 10
# The eleventh parameter is 11
# The total is 110

Reading the script name

`$0 代表了脚本文件的名字

1
2
3
4
5
6
7
8
9
10
11
cat test5
# #!/usr/local/bin/bash
# # Testing the $0 parameter
# echo the zero parameter is set to: $0

bash test5
# the zero parameter is set to: test5
./test5
# the zero parameter is set to: ./test5
bash /Users/i306454/tmp/bash_test/test5
# the zero parameter is set to: /Users/i306454/tmp/bash_test/test5

不一样的调用方式,得到的第 0 参数值会不一样,如果像统一得到文件名,可以使用 basename 命令

1
2
3
4
5
6
7
8
9
10
cat test5b
# #!/usr/local/bin/bash
# # Using basename with the $0 parameter
# name=$(basename $0)
# echo the zero parameter is set to: $name

bash-5.1$ ./test5b
# the zero parameter is set to: test5b
sh test5b
# the zero parameter is set to: test5b

Testing parameters

当脚本中需要用到参数,但是参数没有给,则脚本会抛异常,但是我们可以更优雅的处理这种情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat test7 
# #!/usr/local/bin/bash
# # Testing parameters before use
# if [ -n "$1" ]
# then
# echo Hello $1, glad to meet you.
# else
# echo "Sorry, you did not identity yourself."
# fi

./test7 jack
# Hello jack, glad to meet you.
./test7
# Sorry, you did not identity yourself.

Using Special Parameter Variables

Counting parameters

$# 用于统计参数数量

1
2
3
4
5
6
7
8
9
10
11
12
cat test8
# #!/usr/local/bin/bash
# # Getting the number of parameters
# echo There were $# parameters supplied.
./test8
# There were 0 parameters supplied.
./test8 123
# There were 1 parameters supplied.
./test8 1 2 3
# There were 3 parameters supplied.
./test8 "jack zheng"
# There were 1 parameters supplied.

根据上面的特性我们可以试着发散一下思路,尝试拿到最后一个参数

1
2
3
4
5
6
cat badtest1
# #!/usr/local/bin/bash
# # Testing grabbing last parameter
# echo The last parameter was ${$#}
./badtest1 1 2 3
# The last parameter was 13965

尝试失败,语法上来说,花括号中间是不允许有 $ 符号的,你可以用叹号表达上面的意思

1
2
3
4
5
6
# #!/usr/local/bin/bash
# # Testing grabbing last parameter
# echo The last parameter was ${!#}

./badtest1 1 2 3
# The last parameter was 3

Grabbing all the data

你可以使用 $* 或者 $@ 拿到所有的参数,区别如下

  • $* 会将所有的参数当作一个变量对待
  • $@ 会将所有的参数当作类似数组那种概念,分开对待。也就是说,你可以在 for 中循环处理
1
2
3
4
5
6
7
8
9
10
11
cat test11
# #!/usr/local/bin/bash
# # Testing $* and $@
# echo
# echo "Using the \$* method: $*"
# echo "Using the \$@ method: $@"

./test11 a b c d

# Using the $* method: a b c d
# Using the $@ method: a b c d

看上去没区别。。。这里需要结合 for 来观察

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
cat test12
# #!/usr/local/bin/bash
# # Testing $* and $@
# echo
# count=1
# #
# for param in "$*"
# do
# echo "\$* Parameter #$count = $param"
# count=$[ $count + 1 ]
# done
# #
# echo
# count=1
# #
# for param in "$@"
# do
# echo "\$@ Parameter #$count = $param"
# count=$[ $count + 1 ]
# done
./test12 a b c d

# $* Parameter #1 = a b c d

# $@ Parameter #1 = a
# $@ Parameter #2 = b
# $@ Parameter #3 = c
# $@ Parameter #4 = d

Being Shify

我们可以通过 shift 关键字将参数左移,默认左移一位.

PS: note that the value for variable $0, the program name, remains unchanged

PPS: Be careful when working with the shift command. When a parameter is shifted out, its value is lost and can’t be recovered.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat test13
#!/usr/local/bin/bash
# Demostrating the shift command
echo
count=1
while [ -n "$1" ]
do
echo "Parameter #$count = $1"
count=$[ $count + 1 ]
shift
done

./test13 a b c d

# Parameter #1 = a
# Parameter #2 = b
# Parameter #3 = c
# Parameter #4 = d

移动多个位置测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat test14
#!/usr/local/bin/bash
# Demostrating a multi-position shift
echo
echo "The original parameters: $*"
shift 2
echo "The changed parameters: $*"
echo "Here is the new first parameter: $1"

./test14 1 2 3 4

# The original parameters: 1 2 3 4
# The changed parameters: 3 4
# Here is the new first parameter: 3

Working with Options

介绍三种添加 Options 的方法,Options 顾名思义,就是命令中的可选参数。

Finding your options

单个依次处理法: 可以使用 case + shift 的语法识别 options。将预先设置的 Options 添加在 case 的过滤列表中,然后遍历 $1 识别它

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
cat test15
#!/usr/local/bin/bash
# Extracting command line options as parameters
#
echo
while [ -n "$1" ]
do
case "$1" in
-a) echo "Found the -a option";;
-b) echo "Found the -b option";;
-c) echo "Found the -c option";;
*) echo "$1 is not an option";;
esac
shift
done

./test15 -a -b -c -d asd

# Found the -a option
# Found the -b option
# Found the -c option
# -d is not an option
# asd is not an option

Options parameters 分开处理法: 我们可以认为的在两种参数中间添加一个分割符,比如 -- 作为 options 的结束和 parameter 的开始. 在脚本中现实的识别并处理它。

PS: 突然意识到 $0 是不算在 $*$@ 中的

下面的例子中,如果没有 --,则所有参数都在第一个 do-while 中处理了。加了之后会在两个 loop 中处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
cat test16
#!/usr/local/bin/bash
# Extracting options and parameters
#
echo
while [ -n "$1" ]
do
case "$1" in
-a) echo "Found the -a option";;
-b) echo "Found the -b option";;
-c) echo "Found the -c option";;
--) shift
break;;
*) echo "$1 is not an option";;
esac
shift
done
#
count=1
for param in $@
do
echo "Parameter #$count: $param"
count=$[ $count + 1 ]
done

./test16 -c -a -b test1 test2 test3

# Found the -c option
# Found the -a option
# Found the -b option
# test1 is not an option
# test2 is not an option
# test3 is not an option

./test16 -c -a -b -- test1 test2 test3
# Found the -c option
# Found the -a option
# Found the -b option
# Parameter #1: test1
# Parameter #2: test2
# Parameter #3: test3

带值的 options 处理: 有些命令中 options 是带值的,比如 ./testing.sh -a test1 -b -c -d test2。这是我们就需要在脚本中识别可选参对应的值

下面的例子中 -b test1 是一个带值的可选参数,我们在 识别到 -b 后立即拿到 $2 即为对应的值

PS: 但是怎么看,bash 中添加可选参数都很麻烦啊,如果是可选参数带多个值呢,那不是还得加逻辑。。。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
cat test17
#!/usr/local/bin/bash
# Extracting command line options and values
#
echo
while [ -n "$1" ]
do
case "$1" in
-a) echo "Found the -a option";;
-b) param="$2"
echo "Found the -b option, with parameter value $param"
shift ;;
-c) echo "Found the -c option";;
--) shift
break;;
*) echo "$1 is not an option";;
esac
shift
done
#
count=1
for param in $@
do
echo "Parameter #$count: $param"
count=$[ $count + 1 ]
done

./test17 -a -b test1 -d

# Found the -a option
# Found the -b option, with parameter value test1
# -d is not an option

Using the getopt command

介绍 getopt 工具函数,方便处理传入的参数

Looking at the command format getopt 可以接受一系列的 options 和 parameters 并以正确的格式返回, 语法如下 getopt optstring parameters

Tips getopt 还有一个增强版 getopts, 后面章节会介绍

测试 getopt, b 后面添加了冒号表示它是带值的可选参数. 如果输入的命令带有未定义的参数,则会给出错误信息。如果想要忽略错误信息,则需要 getopt 带 -q 参数

1
2
3
4
5
6
7
8
9
getopt ab:cd -a -b test1 -cd test2 test3
# -a -b test1 -c -d -- test2 test3

getopt ab:cd -a -b test1 -cde test2 test3
# getopt: illegal option -- e
# -a -b test1 -c -d -- test2 test3

getopt -q ab:cd -a -b test1 -cde test2 test3
# -a -b 'test1' -c -d -- 'test2' 'test3'

PS: MacOS 的 bash 是不支持 -q 参数的!使用 docker 绕过了这个限制 诶嘿 ╮( ̄▽ ̄””)╭

Using getopt in your scripts 这里有一个小技巧,我们需要将 getopt 和 set 配和使用 set -- $(getopt -q ab:cd "$@")

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#!/usr/local/bin/bash
# Extracting command line options and values with getopt
#
set -- $(getopt ab:cd "$@")
# 为了兼容 Mac version bash, 将参数去掉了
# set -- $(getopt -q ab:cd "$@")
#
echo
while [ -n "$1" ]
do
case "$1" in
-a) echo "Found the -a option";;
-b) param="$2"
echo "Found the -b option, with parameter value $param"
shift ;;
-c) echo "Found the -c option";;
--) shift
break;;
*) echo "$1 is not an option";;
esac
shift
done
#
count=1
for param in $@
do
echo "Parameter #$count: $param"
count=$[ $count + 1 ]
done

./test18 -ac

# Found the -a option
# Found the -c option

./test18 -a -b test1 -cd test2 test3 test4

# Found the -a option
# Found the -b option, with parameter value test1
# Found the -c option
# -d is not an option
# Parameter #1: test2
# Parameter #2: test3
# Parameter #3: test4

./test18 -a -b test1 -cd "test2 test3" test4

# Found the -a option
# Found the -b option, with parameter value test1
# Found the -c option
# -d is not an option
# Parameter #1: test2
# Parameter #2: test3
# Parameter #3: test4

PS: 最后一个例子中可以看到 getopt 并不能很好的处理字符串, “test2 test3” 被分开解析了。幸运的是,我们有办法解决这个问题

Advancing to getopts

getoptsgetopt 的增强版本,格式如下 getopts optstring variable, optstring 以冒号开始

如下所示,getopts

  • 自动为我们将每个参数封装到 opt 变量中
  • 删选的时候省区了 -
  • 提供了内置的 $OPTARG 代表可选参数的值
  • 将为定义的参数类型用问好替换
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
cat test19
#!/usr/local/bin/bash
# Simple demostration of the getopts command
#
echo
while getopts :ab:c opt
do
case "$opt" in
a) echo "Found the -a option" ;;
b) echo "Found the -b option, with value $OPTARG" ;;
c) echo "Found the -c option" ;;
*) echo "Unknown option: $opt" ;;
esac
done

./test19 -ab test1 -c

# Found the -a option
# Found the -b option, with value test1
# Found the -c option

./test19 -b "test1 test2" -a

# Found the -b option, with value test1 test2
# Found the -a option

./test19 -d

# Unknown option: ?

getopts 还内置了一个 OPTIND 变量,可以在处理每个参数的时候自动 +1. OPTIND 变量初始值为 1,如果要取 params 部分,则 shift $[ $OPTIND - 1 ]

下面例子中,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
cat test20
#!/usr/local/bin/bash
# Processing options $ paramters with getopts
#
echo
echo start index: "$OPTIND"
while getopts :ab:cd opt
do
case "$opt" in
a) echo "Found the -a option" ;;
b) echo "Found the -b option, with value $OPTARG" ;;
c) echo "Found the -c option" ;;
d) echo "Found the -d option" ;;
*) echo "Unknown option: $opt" ;;
esac
echo changing index: "$OPTIND"
done
#
echo
echo opt index: "$OPTIND"
shift $[ $OPTIND -1 ]
#
echo
count=1
for param in "$@"
do
echo "Parameter $count: $param"
count=$[ $count + 1 ]
done

./test20 -a -b test1 -d test2 test3 test4

# Found the -a option
# Found the -b option, with value test1
# Found the -d option

# opt index: 5

# Parameter 1: test2
# Parameter 2: test3
# Parameter 3: test4

Standardizing Options

介绍 shell 中参数表示的 comman sense

Option Description
-a Shows all objects
-c Produces a count
-d Specifies a directory
-e Expands an object
-f Specifies a file to read data from
-h Displays a help message for the command
-i Ignores text case
-l Produces a long format version of the output
-n Uses a non-interactive(batch) mode
-o Specifies an output file to redirect all output
-q Run in quiet mode
-r Processes directories and files recursively
-s Runs in silent mode
-v Produces verbose output
-x Excludes an object
-y Answers yes to all questions

Getting User Input

bash 提供了 read 方法来作为用户输入

Reading basics

read 可以从键盘或文件中获取输入

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat test21
#!/usr/local/bin/bash
# Testing the read command
#
echo -n "Enter your name: "
read name
echo "Hello $name, welcome to my program."

./test21
# Enter your name: jack
# Hello jack, welcome to my program.

./test21
# Enter your name: jack zheng
# Hello jack zheng, welcome to my program.

上面的例子中,cmd 会将所有的输入看作一个变量处理

带用户提示的输入

1
2
3
4
5
6
7
8
9
10
11
cat test22
#!/usr/local/bin/bash
# Testing the read -p option
#
read -p "Please enter your age: " age
days=$[ $age * 365 ]
echo "That makes you over $days days old! "

./test22
# Please enter your age: 5
# That makes you over 1825 days old!

和第一个实验对照,cmd 也可以将所有输入当作 list 处理

1
2
3
4
5
6
7
8
9
10
cat test23
#!/usr/local/bin/bash
# Testing the read command
#
read -p "Enter your name: " first last
echo "Checking data for $last, $first"

./test23
# Enter your name: jack zheng
# Checking data for zheng, jack

如果你没有为 read 指定变量,bash 会自动将这个值赋给环境变量 $REPLY

1
2
3
4
5
6
7
8
9
10
11
12
cat test24
#!/usr/local/bin/bash
# Testing the REPLY Environment variable
#
read -p "Enter your name: "
echo
echo "Hello $REPLY, welcome to my program."

./test24
# Enter your name: jack zheng

# Hello jack zheng, welcome to my program.

Timing out

默认情况下 read 会一直阻塞在那里,等待用户输入,但是我们也可以设置一个等待时间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat test25
#!/usr/local/bin/bash
# Timing the data entry
#
if read -t 5 -p "Please enter your name: " name
then
echo "Hello $name, welcome to my script"
else
echo
echo "Sorry, too slow! "
fi

./test25
# Please enter your name:
# Sorry, too slow!
./test25
# Please enter your name: jack
# Hello jack, welcome to my script

read 还可以指定输入的长度, 设定好长度后,你输入对应长度的内容,他立马就执行下去了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat test26
#!/usr/local/bin/bash
# Getting just one character of input
#
read -n1 -p "Do you want to continue [Y/N]? " answer
case $answer in
Y | y) echo
echo "fine, continue on..." ;;
N | n) echo
echo OK, goodbye
exit ;;
esac
echo "This is the end of the script"

./test26
# Do you want to continue [Y/N]? y
# fine, continue on...
# This is the end of the script

./test26
# Do you want to continue [Y/N]? n
# OK, goodbye

Reading with no display

在输入一些敏感信息时,你不希望他显示在屏幕上,可以用 -s 参数

1
2
3
4
5
6
7
8
9
10
11
cat test27
#!/usr/local/bin/bash
# hiding input data from the monitor
#
read -s -p "Enter your password: " pass
echo
echo "Is you password really $pass? "

./test27
# Enter your password:
# Is you password really jack?

Reading from a file

Linux 系统中,可以通过 read 命令从文件中按行读取。当读完后,返回非 0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat test28
#!/usr/local/bin/bash
# Reading data from a file
#
count=1
cat test | while read line
do
echo "Line $count: $line"
count=$[ $count + 1 ]
done
echo "Finished process the file"

cat test
# line 1
# line 2
./test28
Line 1: line 1
Line 2: line 2
Finished process the file

PS: 如果文件的末行没有换行,则最后一行并不会被处理

Presenting Data

这章主要向你展示更多的输出流处理技巧

Understanding Input and Output

现在为止,我们主要采取两种输出流展示方式

  • 终端屏显
  • 重定向到文件

目前为止我们只能将全部内容一起输出到文件或屏幕,在下面的小节中,我们将尝试将内容分开处理

Standard file descriptors

Linux 系统通过 file descriptor 来指代每一个文件对象,这个 decriptor 是一个唯一的非负的整数。每个进程同一时间允许至多 9 个打开的文件。bash 中将 0,1 和 2 用于特定的用途

File descriptor Abbreviation Description
0 STDIN Standard input
1 STDOUT Standard output
2 STDERR Standard error

STDIN 标准输入,比如终端的键盘输入和 < 的文件输入. 很多 bash 命令接收 STDIN 的输入,比如 cat, 如果你没有指定文件,他就会接收键盘输入

1
2
3
4
5
$ cat 
thi
thi
this
this

STDOUT shell 的标准输出就是 terminal monitor.

STDERR 当运行命令出异常了,可以使用这个 descriptor 导流

Redirecting errors

Redirecting error only

像前面表哥所示,STDERR 的文件描述符是 2,你可以在 redirection symbol 前加上这个标识符来指定导向

1
2
3
ls -al badfile 2> test4
cat test4
# ls: badfile: No such file or directory

下面的例子中,badfile 不存在,所以错误信息写入 test5 中,test4 存在,所以在屏幕上显示

1
2
3
4
ls -al test4 badfile 2> test5
# -rw-r--r-- 1 i306454 staff 0 May 29 14:07 test4
cat test5
# ls: badfile: No such file or directory

Redirecting errors and data

如果你想将正常和异常的信息都输出到文件,你需要指定两个输出

1
2
3
4
5
6
7
8
9
ls 
# test5
ls -al test5 badfile 2> test6 1> test7
ls
# test5 test6 test7
cat test6
# ls: badfile: No such file or directory
cat test7
# -rw-r--r-- 1 i306454 staff 39 May 29 14:08 test5

如果你想将这两种信息都导入一个文件,bash 提供了一个特殊的 redirection symbol 来做这个事情 &>

1
2
3
4
ls -al test5 badfile &> test8
cat test8
# ls: badfile: No such file or directory
# -rw-r--r-- 1 i306454 staff 39 May 29 14:08 test5

Redirecting Output in Scripts

通过 STDOUT 和 STDERR 你可以将输出导入任何 file discriptors. 有两种方式可以重定向输出

  • Temporarily redirecting each line
  • Permanently redirecting all comands in the script

Temorary redirections

这段文字的描述有点蹩脚,还是直接用案例说明把。假如你想将你的 echo 内容输出到 STDERR 指定的流中,需要怎么做?这种用法就是打印自己的 err log 啊, 可以使用 >&2 的格式

下面的例子中,test8 中指定第一个 echo 通过 STDERR 输出,第二个 STDOUT 输出。当直接调用时,由于两个输出默认都是打在公屏上的,所以没什么区别,但是当我指定 err 输出到 test9 时,区别就出现了。只有错误信息导到 test9 了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat test8
#!/usr/local/bin/bash
# Testing STDERR messages

echo "This is an error" >&2
echo "This is normal output"

./test8
This is an error
This is normal output

./test8 2> test9
# This is normal output
cat test9
# This is an error

Permanent redirections

上面的情况适合少量打 log 的情况。如果你有好多 err 需要重新导向,你可以这么做

exec 会启动一个新的 shell,下例中新启动的 shell 会将 STDOUT 的内容都发送的 testout 文件中去

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat test10
#!/usr/local/bin/bash
# Redirecting all output to a file
exec 1>testout

echo "This is a test of redirecting all output"
echo "from a script to another file."
echo "Without having to redirect every individual line"

./test10
cat testout
# This is a test of redirecting all output
# from a script to another file.
# Without having to redirect every individual line

你可以在程序中间做这样的操作. 下面的例子中,我们在开头部分指定 err 输出到 testerror 文件,接着打印两个普通输出。再指定普通输出,输出到文件,最后指定 err 输出到 err 文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
cat test11
#!/usr/local/bin/bash
# Redirecting output to different locations

exec 2> testerror

echo "This is the start of the script"
echo "now redirecting all output to another location"

exec 1>testout

echo "This output should go to the testout file"
echo "but this should go to the testerror file" >&2

./test11
# This is the start of the script
# now redirecting all output to another location
cat testout
# This output should go to the testout file
cat testerror
# but this should go to the testerror file

当你改变了 STDOUT 或者 STDERR 后,要再改回来就不是那么容易了,如果你需要切换这些流,需要用到一些技巧,这些将在后面的 Creating Your Own Redirection 章节讲到

Redirecting Input in Scripts

和输出流一样,我们可以通过定向符号操纵输入流 exec 0< testfile

下面的例子中,我们以文件中的命令代替键盘输入

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
cat test12
#!/usr/local/bin/bash
# Redirecting file input

exec 0< test12
count=1

while read line
do
echo "Line #$count: $line"
count=$[ $count + 1 ]
done

./test12
# Line #1: #!/usr/local/bin/bash
# Line #2: # Redirecting file input
# Line #3:
# Line #4: exec 0< test12
# Line #5: count=1
# Line #6:
# Line #7: while read line
# Line #8: do
# Line #9: echo "Line #$count: $line"
# Line #10: count=$[ $count + 1 ]
# Line #11: done
# Line #12:

Creating Your Own Redirection

shell 中最多只能有 9 个 file descriptor, 我们已经用了 0,1,2.下面我们将使用 3-8 自定义我们自己的 file descriptor.

Creating output file descriptors

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat test13
#!/usr/local/bin/bash
# Using an alternative file descriptor

exec 3> test13out

echo "This should display on the monitor"
echo "and this should stored in the file" >&3
echo "Then this should be back on the monitor"

./test13
# This should display on the monitor
# Then this should be back on the monitor
cat test13out
# and this should stored in the file

这个概念听起来有点复杂,但是其实很直接了当的,用法也和前面的默认文件描述符是一致的。

Redirecting file descriptors

下面的例子中我们会做重定向的切换。开始时,我们用 3 号 descriptor 代替 1 的位置。就是所有的 echo 都会输入到 3 号中,之后,我们还原,echo 就又输出到屏幕了。这个其实只用到了一个语法 3>&1,这个语句就是用 3 代替 1, 还原的时候顺序翻一下即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat test14
#!/usr/local/bin/bash
# Using STDOUT, then coming back to it

exec 3>&1
exec 1>test14out

echo "This should store in the output file"
echo "along with this lien"

exec 1>&3

echo "Now things should be back to normal"

./test14
# Now things should be back to normal
cat test14out
# This should store in the output file
# along with this lien

Creating input file descriptors

和输出流一样,输入流也可以用上面的这个技巧。下面的例子中,我们先用 6 号代替原始的 0 号键盘输入,做完操作后将它还原

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cat test15
#!/usr/local/bin/bash
# Redirecting input file descriptors

exec 6<&0
exec 0< testfile

count=1
while read line
do
echo "Line #$count: $line"
count=$[ $ount + 1 ]
done

exec 0<&6
read -p "Are you done now? " answer
case $answer in
Y | y) echo "Goodbye" ;;
N | n) echo "Sorry, this is the end." ;;
esac

./test15
# Line #1: line 1
# Line #1: line 2
# Line #1: line 3
# Are you done now? y
# Goodbye

Creating a read/write file descriptor

感觉这个例子有点。。。鸡肋。虽然实用性不高,但是挺有趣

下面的例子中,我们会将 3 同时设置为输入输出描述符。先读取一行,再输入一行。读取一行后,位置定位到第二行开头,这个时候写我们自己的内容,他会覆盖之前的内容。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
cat test16
#!/usr/local/bin/bash
# Redirecting input/output file descriptor

exec 3<> testfile
read line <&3

echo "Read: $line"
echo "This is a test line" >&3

cat testfile
# This is the first line.
# This is the second line.
# This is the third line.

./test16
# Read: This is the first line.

cat testfile
# This is the first line.
# This is a test line
# ine.
# This is the third line.

Closing file descriptors

新创建的 file descriptors 都会在脚本结束时自动关闭。但是如果你想在接本结束前手动关闭,需要做什么?

关闭的格式如下 exec 3>&-, 下面的实验中我们将 3 号指向文件,输出内容,再关闭它,再试着输出内容。可以看到,关闭后再输出会抛异常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cat badtest
#!/usr/local/bin/bash
# Testing closing file descriptors

exec 3> test17file

echo "This is a test line of data" >&3

exec 3>&-

echo "This won't work" >&3

./badtest
# ./badtest: line 10: 3: Bad file descriptor
cat test17file
# This is a test line of data

除此之外还有一个更重要的细节需要注意,如果你在一个脚本中,关闭后再使用同一个 file descriptor 的话。它会将之前写的内容覆盖掉

下面实验中,我们先用 3 号描述符将信息写入文件,关闭后 cat 输出,再打开它写东西。最后发现之前写的被覆盖了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat test17
#!/usr/local/bin/bash
# Testing closing file descriptors

exec 3> test17file
echo "This is a test line of data" >&3
exec 3>&-

cat test17file

exec 3> test17file
echo "This'll be bad" >&3

./test17
# This is a test line of data
cat test17file
# This'll be bad

Listing Open File Descriptors

lsof 可以列出整个系统所有发开的 file descriptor, 这个在权限方面有些争议。MacOS 也有这个命令

1
2
which lsof
# /usr/sbin/lsof

显示当前进程的文件描述符使用情况

1
2
3
4
5
lsof -a -p $$ -d 0,1,2
# COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
# bash 35349 i306454 0u CHR 16,1 0t937569 667 /dev/ttys001
# bash 35349 i306454 1u CHR 16,1 0t937569 667 /dev/ttys001
# bash 35349 i306454 2u CHR 16,1 0t937569 667 /dev/ttys001

lsof 输出说明

Column Description
COMMAND The first nine characters of the name of the command in the process
PID The process ID of the process
USER The login name of the user who owns the process
FD The file descriptor number and access type. r-read, w-write, u-read/write
TYPE The type of file. CHR-character, BLK-block, DIR-directory, REG-regular file
DEVICE The device numbers(major and minor) of the device
SIZE If available, the size of the file
NODE The node number of the local file
NAME The name of the file

作为对比,下面是一个文件中的 file descriptor 的信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat test18
#!/usr/local/bin/bash
# Testing lsof with file descriptors

exec 3> test18file1
exec 6> test18file2
exec 7< testfile

lsof -a -p $$ -d 0,1,2,3,6,7

./test18
# COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
# bash 39156 i306454 0u CHR 16,1 0t937995 667 /dev/ttys001
# bash 39156 i306454 1u CHR 16,1 0t937995 667 /dev/ttys001
# bash 39156 i306454 2u CHR 16,1 0t937995 667 /dev/ttys001
# bash 39156 i306454 3w REG 1,5 0 51179042 /Users/i306454/tmp/bash_test/test18file1
# bash 39156 i306454 6w REG 1,5 0 51179043 /Users/i306454/tmp/bash_test/test18file2
# bash 39156 i306454 7r REG 1,5 73 51174255 /Users/i306454/tmp/bash_test/testfile

Suppressing Command Output

有些时候,你并不想看到任何异常输出,比如后台运行的时候。这时你可以将 STDERR 的内容输出到 null file 中去,位置是 /dev/null

1
2
3
4
5
ls -al > /dev/null
# cat /dev/null

ls -al badfile test16 2> /dev/null
# -rwxr--r-- 1 i306454 staff 151 May 30 15:12 test16

你也可以将输入指定到 null file. 这样做可以快速清空一个文件,算是 rm + touch 的简化版

1
2
3
4
5
6
7
8
cat testfile
# This is the first line.
# This is a test line
# ine.
# This is the third line.

cat /dev/null > testfile
cat testfile

Using Temporary Files

Linux 系统预留了 /tmp 文件夹放置临时文件, 设置提供了专用命令 mktemp 来创建临时文件,这个命令创建的文件在 umask 上给创建者所有权限,其他人则没有权限

Creating a local temporary file

零时文件名字中的末尾的大写 X 会被替换成随机数

1
2
3
mktemp testing.XXXXXX
ls testing*
# testing.I3p5pe

实验脚本中,我们创建一个临时文件并写入内容,然后关闭流,并 cat 一下。最后移除文件。把异常信息丢掉不显示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
cat test19
#!/usr/local/bin/bash
# Creating and using a temp file

tempfile=$(mktemp test19.XXXXXX)


exec 3>$tempfile

echo "This script writes to temp file $tempfile"

echo "This is the first line" >&3
echo "This is the second line" >&3
echo "This is the third line" >&3

exec 3>&-

echo "Done creating temp file. the contents are:"
cat $tempfile

rm -f $tempfile 2> /dev/null

./test19
# This script writes to temp file test19.ksgCft
# Done creating temp file. the contents are:
# This is the first line
# This is the second line
# This is the third line
ls test19*
# test19

Creating a temporary file in /tmp

前面的临时文件都是创建在当前文件夹下的,下面介绍在 tmp 文件夹下的创建办法,其实就是加一个 -t 的参数。。。结果和书上有区别,并该是 MacOS 定制过

1
2
3

mktemp -t test.XXXXXX
# /var/folders/yr/8yr4mzlj1x34tf4m9c_wh2_h0000gn/T/test.XXXXXX.enBtwKYB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat test20
#!/usr/local/bin/bash
# Creating a temp file in /tmp

tempfile=$(mktemp tmp.XXXXXX)

echo "This is a test file." > $tempfile
echo "This is the second line of the test." >> $tempfile

echo "The temp file is located at: $tempfile"
cat $tempfile
rm -f $tempfile

./test20
# The temp file is located at: tmp.OldzdO
# This is a test file.
# This is the second line of the test.

Creating a temporary directory

-d 创建临时文件夹

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
cat test21
#!/usr/local/bin/bash
# Using a temporary directory

tempdir=$(mktemp -d dir.XXXXXX)
cd $tempdir
tempfile1=$(mktemp temp.XXXXXX)
tempfile2=$(mktemp temp.XXXXXX)

exec 7> $tempfile1
exec 8> $tempfile2

echo "Sending data to directory $tempdir"
echo "This is a test line of data for $tempfile1" >&7
echo "This is a test line of data for $tempfile2" >&8

./test21
# Sending data to directory dir.hE8Hbr

ls -l dir*
# total 16
# -rw------- 1 i306454 staff 44 May 30 16:53 temp.2V4FAk
# -rw------- 1 i306454 staff 44 May 30 16:53 temp.59JjLm

cat dir.hE8Hbr/temp.2V4FAk
# This is a test line of data for temp.2V4FAk
cat dir.hE8Hbr/temp.59JjLm
# This is a test line of data for temp.59JjLm

Logging Messages

有时你可能想一个流即打印到屏幕上,也输出到文件中,这个时候,你可以使用 tee

1
2
3
4
date | tee testfile 
# Sun May 30 16:58:30 CST 2021
cat testfile
# Sun May 30 16:58:30 CST 2021

注意,tee 默认会覆盖原有内容

1
2
3
4
5
6
who | tee testfile
# i306454 console May 27 15:12
# i306454 ttys000 May 27 16:21
cat testfile
# i306454 console May 27 15:12
# i306454 ttys000 May 27 16:21

之前 date 的内容被覆盖了,你可以用 -a 做 append 操作

1
2
3
4
5
6
date | tee -a testfile
# Sun May 30 17:00:41 CST 2021
cat testfile
# i306454 console May 27 15:12
# i306454 ttys000 May 27 16:21
# Sun May 30 17:00:41 CST 2021

实操

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat test22
#!/usr/local/bin/bash
# Using the tee command for logging

tempfile=test22file

echo "This is the start of the test" | tee $tempfile
echo "This is the second of the test" | tee -a $tempfile
echo "This is the end of the test" | tee -a $tempfile

./test22
# This is the start of the test
# This is the second of the test
# This is the end of the test
cat test22file
# This is the start of the test
# This is the second of the test
# This is the end of the test

Practical Example

解析一个 csv 文件,将其中的内容重组成一个 SQL 文件用作 import

1
2
3
4
5
<!-- cat members.csv  -->
Blum,Richard,123 Main St.,Chicago,IL,60601
Blum,Barbara,123 Main St.,Chicago,IL,60601
Bresnahan,Christine,456 Oak Ave.,Columbus,OH,43201
Bresnahan,Timothy,456 Oak Ave.,Columbus,OH,43201bash-5.1$ ./test23

主要语法说明

  • done < ${1}: 将命令行中给的文件做输入
  • read lname…: 以逗号为分割,每个字段给个名字,方便后面调用
  • cat >> $outfile << EOF: cat 会拿到一行的内容,并对这些内容做替换放到 outfile 中去. 这个用法和 tee myfile << EOF.. 有异曲同工之妙
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat test23
#!/usr/local/bin/bash
# Read file and create INSERT statements for MySQL

outfile='members.sql'
IFS=','
while read lname fname address city state zip
do
cat >> $outfile << EOF
INSERT INTO members (lname, fname, address, city, state, zip) VALUES ('$lanme', '$fname', '$address', '$city', '$state', '$zip');
EOF
done < ${1}

./test23 members.csv
cat members.sql
INSERT INTO members (lname, fname, address, city, state, zip) VALUES ('', 'Richard', '123 Main St.', 'Chicago', 'IL', '60601');
INSERT INTO members (lname, fname, address, city, state, zip) VALUES ('', 'Barbara', '123 Main St.', 'Chicago', 'IL', '60601');
INSERT INTO members (lname, fname, address, city, state, zip) VALUES ('', 'Christine', '456 Oak Ave.', 'Columbus', 'OH', '43201');

Script Control

Handling Signals

Signal Name Description
1 SIGHUP Hangs up
2 SIGINT Interrupts
3 SIGQUIT Stops running
9 SIGKILL Unconditionally terminates
11 SIGSEGV Produces segment violation
15 SIGTERM Terminates if possible
17 SIGSTOP Stops unconditionally, but doesn’t terminate
18 SIGTSTP Stops or pauses, but continues to run in background
19 SIGCONT Resumes execution after STOP or TSTP

默认情况下,bash shell 会忽略 QUIT 和 TERM 这两个信息,但是可以识别 HUP 和 INT。

Generating signals

通过键盘操作你可以产生两种信号

Interrupting a process Ctrl + C 可以生成 SIGINT 信号并把它发送到任何终端正在执行的进程中

Pausing a process Ctrl + Z 停止一个进程

1
2
3
4
5
6
sleep 100
# ^Z
# [1]+ Stopped sleep 100
exit
# exit
# There are stopped jobs.

当有 stop 的进程是,你是不能退出 bash 的可以用 ps 查看. 对应的 job 的 S(status) 为 T。你可以使用 kill 杀死它

1
2
3
4
5
6
ps -l 
# UID PID PPID F CPU PRI NI SZ RSS WCHAN S ADDR TTY TIME CMD
# ...
# 501 17153 12576 4006 0 31 0 4268408 672 - T 0 ttys001 0:00.00 sleep 100
# ...
kill -9 17153

Trapping signals

脚本中我们可以指定需要忽略的 signal, 格式为 trap command signals

下面的示例中,我们在脚本中指定忽略 Ctrl + C 发出的 SIGINT 信号,运行过程中即使按下组合键脚本继续运行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
cat test1.sh 
#!/usr/local/bin/bash
# Testing signal trapping
#
trap "echo ' Sorry! I have trapped Ctrl-C'" SIGINT
#
echo This is a test script
#
count=1
while [ $count -le 10 ]
do
echo "Loop #$count"
sleep 1
count=$[ $count + 1 ]
done
#
echo "This is the end of the test script"

./test1.sh
# This is a test script
# Loop #1
# Loop #2
# Loop #3
# Loop #4
# ^C Sorry! I have trapped Ctrl-C
# Loop #5
# Loop #6
# Loop #7
# Loop #8
# Loop #9
# Loop #10
# ^C Sorry! I have trapped Ctrl-C
# This is the end of the test script

Trapping a script exit

trap 命令还可以做到,当脚本结束时执行命令的效果,即使是通过 Ctrl + C 结束也会被出发。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cat test2.sh 
#!/usr/local/bin/bash
# Trapping the script exit
#
trap "echo Goodbye..." EXIT
#
count=1
while [ $count -le 5 ]
do
echo "Loop #$count"
sleep 1
count=$[ $count + 1 ]
done
#
echo "This is the end of the test script"

./test2.sh
# Loop #1
# Loop #2
# ^CGoodbye...

Modifying or removing a trap

下面示例中展示了如何修改 trap 的动作。我们先定义当遇到 SIGINT 时 echo 的内容。当 5 秒循环后修改 echo 的内容。通过触发 SIGINT 查看改动是否生效

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
cat test3.sh 
#!/usr/local/bin/bash
# Modifying a set trap
#
trap "echo 'Sorry... Ctrl-C is trapped.'" SIGINT
#
count=1
while [ $count -le 5 ]
do
echo "Loop #$count"
sleep 1
count=$[ $count + 1 ]
done
#
trap "echo ' I modified the trap!'" SIGINT
#
count=1
while [ $count -le 5 ]
do
echo "Loop #$count"
sleep 1
count=$[ $count + 1 ]
done

./test3.sh
# Loop #1
# Loop #2
# ^CSorry... Ctrl-C is trapped.
# Loop #3
# Loop #4
# ^CSorry... Ctrl-C is trapped.
# Loop #5
# Loop #1
# Loop #2
# ^C I modified the trap!
# Loop #3
# Loop #4
# ^C I modified the trap!
# Loop #5

你也可以通过 trap -- SIGINT 移除定义的 trap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
cat test3b.sh 
#!/usr/local/bin/bash
# Modifying a set trap
#
trap "echo 'Sorry... Ctrl-C is trapped.'" SIGINT
#
count=1
while [ $count -le 5 ]
do
echo "Loop #$count"
sleep 1
count=$[ $count + 1 ]
done
#
trap -- SIGINT
echo "I just removed the trap"
#
count=1
while [ $count -le 5 ]
do
echo "Loop #$count"
sleep 1
count=$[ $count + 1 ]
done

./test3b.sh
# Loop #1
# Loop #2
# ^CSorry... Ctrl-C is trapped.
# Loop #3
# Loop #4
# Loop #5
# I just removed the trap
# Loop #1
# Loop #2
# ^C

Tip 上面的去除也可以用单横线 trap - SIGINT

Running scripts in Background Mode

试想一下下面的情形,如果你的脚本需要执行比较长的时间,如果你在终端运行了它,那么你就没有终端可用了。你可以使用 background 的模式跑类似的脚本,ps 显示那些 process 很多都是后台运行的

Running in the background

想要后台运行脚本是很简单的,只需要在调用脚本时后面接一个 ampersand symbol & 即可

下面程序中,我们声明了一个计时器,并在后台运行。运行时他会给出 PID 信息。当脚本执行结束时会打印 done 的信息

PS: bash 测试的时候要我会车才会打印,zsh 自动打印

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cat test4.sh 
#!/usr/local/bin/bash
# Test running in the background
#
count=1
while [ $count -le 10 ]
do
sleep 1
count=$[ $count + 1 ]
done
#

./test4.sh &
# [1] 16253
# [1] + 16360 done ./test4.sh

当使用 background mode 的时候,他还是用的 STDOUT 和 STDERR

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
cat test5.sh                                 
#!/usr/local/bin/bash
# Test running in the background with output
#
echo "Start the test script"
count=1
while [ $count -le 5 ]
do
echo "Loop #$count"
sleep 1
count=$[ $count + 1 ]
done
#
echo "Test script is complete"

./test5.sh &
# [1] 16474
# bash-5.1$ Start the test script
# Loop #1
# Loop #2
# Loop #3
# Loop #4
# Loop #5
# Test script is complete

# [1]+ Done ./test5.sh

Running multiple background jobs

如果你想启动多个 background job 只需要终端运行多个 xx.sh & 即可。每次系统多会分配一个 job id 和 process id 给后台进程,可以通过 ps 查看

1
2
3
4
5
6
7
8
9
10
11
12
./test5.sh &
./test5.sh &
ps
# PID TTY TIME CMD
# 8019 ttys000 0:04.39 /bin/zsh --login -i
# 1479 ttys001 0:01.61 /bin/zsh -l
# 16661 ttys001 0:00.01 /usr/local/bin/bash ./test5.sh
# 16681 ttys001 0:00.01 /usr/local/bin/bash ./test5.sh
# 16700 ttys001 0:00.00 sleep 1
# 16702 ttys001 0:00.00 sleep 1
# 3202 ttys002 0:04.87 -zsh
# 9415 ttys003 0:01.10 -zsh

Running Scripts without a Hang-Up

通过 nohup 命令,你可以让脚本始终在后台运行,即使关闭终端也行

nohup 会将 script 和 STDOUT, STDERR 解绑。自动将输出绑定到 nohup.out 文件。如果你在一个文件夹下启动多个 nohup process, 他们的输出会混在一起

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
nohup ./test1.sh &
cat nohup.out
# This is a test script
# Loop #1
# Loop #2
# Loop #3
# Loop #4
# Loop #5
# Loop #6
# Loop #7
# Loop #8
# Loop #9
# Loop #10
# This is the end of the test script
# [1]+ Done nohup ./test1.sh

Controlling the Job

job control 即 开始/通知/kill/重启 jobs 的动作

Viewing jobs

jobs cmd 让你可以查看 shell 正在运行的 jobs

下面的例子中,我们启动一个计时器脚本。第一次运行,中间通过 Ctrl + Z stop 它。第二次采用后台运行。然后通过 jobs 命令观察这两个 job 的状态。jobs -l 可以显示 PID

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
cat test10.sh 
#!/usr/local/bin/bash
# Test job control
#
# $$ to display the PID of process running this script
echo "Script Process ID: $$"
#
count=1
while [ $count -le 10 ]
do
echo "Loop #$count"
sleep 1
count=$[ $count + 1 ]
done
#
echo "End of script..."

./test10.sh
# Script Process ID: 17179
# Loop #1
# Loop #2
# ^Z
# [1]+ Stopped ./test10.sh
./test10.sh > test10.out &
# [2] 17192
jobs
# [1]+ Stopped ./test10.sh
# [2]- Running ./test10.sh > test10.out &
jobs -l
# [1]+ 17179 Suspended: 18 ./test10.sh
# [2]- 17192 Done ./test10.sh > test10.out

jobs 命令的可选参数

Parameter Description
-l List the PID of the process along with the job number
-n Lists only jobs that have changed their status since the last notification from the shell
-p Lists only the PIDs of the jobs
-r Lists only the running jobs
-s Lists only stopped jobs

jobs 列出的信息可以看到加号和减号。+ 表示 default job. - 表示即将变成 default job 的 job。同一时间,只有一个带 加号 的 job 和一个带 减号 的 job。

下面实验中,我们启动三个后台脚本并观察 jobs 状态

1
2
3
4
5
6
7
8
9
10
11
12
13
./test10.sh > test10a.out &
# [1] 17444
./test10.sh > test10b.out &
# [2] 17448
./test10.sh > test10c.out &
# [3] 17456
jobs -l
# [1] 17444 Running ./test10.sh > test10a.out &
# [2]- 17448 Running ./test10.sh > test10b.out &
# [3]+ 17456 Running ./test10.sh > test10c.out &

# 杀死进程
kill 17444

Restarting stopped jobs

通过 bash 的 job control 你可以重新启动停止的脚本,启动方式有 background 和 foreground 两种,后者会接管终端

当有多个 script 停止时,可以使用 bg + num 的方式启动对应的脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
cat test11.sh 
#!/usr/local/bin/bash
# Test job control
#
count=1
while [ $count -le 10 ]
do
sleep 1
count=$[ $count + 1 ]
done
#
echo "End of script..."

./test11.sh
# ^Z
# [1]+ Stopped ./test11.sh
./test11.sh
# ^Z
# [2]+ Stopped ./test11.sh
jobs -l
# [1]- 17657 Suspended: 18 ./test11.sh
# [2]+ 17659 Suspended: 18 ./test11.sh
bg 2
# [2]+ ./test11.sh &
# End of script...

bgfg 的最主要的区别。如果用 bg, 你还可以在当前终端运行命令,如果是 fg 你需要等命令全部执行完了才能继续运行

Being Nice

Linux 系统中各 process 都有优先级,从 -20 到 19 不等。 shell 启动的 process 默认都是 0。19 是最低优先级的。可以通过 Nice guys finish last 方便记忆

Using the nice command

当需要指定优先级时,可以通过使用 nice 命令指定优先级等级

1
2
3
4
5
6
7
# MacOS 不支持 cmd column
nice -n 10 ./test4.sh > test4.out &
# [2] 18051
# [1] Done nice -n 10 ./test4.sh > test4.out
ps -p 18051 -o pid,ppid,ni
# PID PPID NI
# 18051 16881 10

Using the renice command

当 process 运行是,可以通过 renice 调整优先级

1
2
3
4
5
6
7
8
9
./test11.sh &
# [1] 18154
ps -p 18154 -o pid,ni
# PID NI
# 18154 0
renice -n 10 -p 18154
ps -p 18154 -o pid,ni
# PID NI
# 18154 10

和 nice 一样,renice 也有以下限制

  • 只能 renice 你自己 own 的 processes
  • renice 只能将优先级调低
  • root 用户可以用 renice 调整到任何等级

Running Like Clockwork

这章我们将会使用 atcorn 命令让我们的脚本定时运行

Scheduling a job using the at command

at 让你可以定时的在系统中运行脚本,大多数 Linux 系统会在启动时开启一个 atd 的守进程,定时 check 并运行目标路径(/var/spool/at) 下的脚本

Understanding the at command format

at 的基本格式很简单 at [-f filename] time. at 可以识别多种时间格式

  • A standard hour and minute, such as 10:15
  • An AM/PM indicator, such as 10:15PM
  • A specific named time, such as now, noon, midnight or teatime(4PM)

同时你可以指定特定格式的日期

  • A standard date format, such as MMDDYY, MM/DD/YY, or DD.MM.YY
  • A text date, such as Jul 4 or Dec 25, with or withour the year
  • A time increment:
    • Now + 25 minutes
    • 10:15PM tomorrow
    • 10:15 + 7 days

当使用 at 命令的时候,对应的 job 提交到 job queue 中。系统中有 26 种 job 可选,队列名字为字母大小写的 a-z

Note 以前还有一个 batch 命令可以让你在 low useage 状态下运行 script,现在它只是通过调用 at + b queue 完成的定时脚本。队列的字母顺序越靠后,优先级越低。默认 job 优先级为 a

Retrieving job output

Linux 系统中,当 job 运行时是没有监测的地方的。系统会将内容记录到邮件中并发送给联系人

1
2
3
4
5
6
7
8
9
10
11
cat test13.sh 
#!/usr/local/bin/bash
# Test using at command
#
echo "This script ran at $(date +%B%d,%T)"
echo sleep 5
echo "This is the script's end"
#

at -f test13.sh now
# job 1 at Thu Jun 3 12:27:19 2021

如果你系统没有配置邮箱,那就收不到邮件了,你可以直接指定输入到文件

PS: 这个实验失败,运行后我并没有看到 out 文件

1
2
3
4
5
6
7
8
9
10
11
12
13
cat test13b.sh 
#!/usr/local/bin/bash
# Test using at command
#
echo "This script ran at $(date +%B%d,%T)" > test13b.out
echo >> test13b.out
echo sleep 5
echo "This is the script's end" >> test13b.out
#

# 查了一下 at 并没有 -M 这个可选参数啊。。
at -M -f test13b.sh now
# job 3 at Thu Jun 3 12:34:11 2021

Listing pending jobs

显示所有 pendng 的 job, 我还以为也是用 jobs 呢,忙乎了半天

1
2
3
4
5
6
7
atq
# 1 Thu Jun 3 12:27:00 2021
# 4 Thu Jun 3 12:35:00 2021
# 5 Thu Jun 3 12:35:00 2021
# 2 Thu Jun 3 12:32:00 2021
# 3 Thu Jun 3 12:34:00 2021
# 6 Thu Jun 3 16:00:00 2021

Removing jobs

删除 at queue 中的 job

1
2
3
4
5
6
7
atrm 1
atq
# 4 Thu Jun 3 12:35:00 2021
# 5 Thu Jun 3 12:35:00 2021
# 2 Thu Jun 3 12:32:00 2021
# 3 Thu Jun 3 12:34:00 2021
# 6 Thu Jun 3 16:00:00 2021

Mac 是不是有什么特殊设置啊, 之前启动的 at job 都 block 了

Scheduling regular scripts

at 只能配置一次性 job, 如果要配置可重复的 job,可以用 cron. cron 在后台运行,他会检查 cron tables 看哪些 job 需要运行

Looking at the cron table

cron job 的语法:min hour dayofmonth month dayofweek command 示例如下

1
2
3
4
5
6
7
8
9
10
11
# * 表示 每 的意思
# 下面的定时为 每天 10:15
15 10 * * * command

# 每个周一的 16:15
15 16 * * 1 command
# dayofweek 也可以是三个字母的表示 mon, tue, wed, thu, fri, sat, sun

# 每个月的第一天 12:00
00 12 1 * * command
# dayofmonth 为 1-31

Note 怎么设置每月最后天 run 的 job? 可以通过检查明天是不是第一天解决这个问题,示例:00 12 * * * if [date +%d -d tomorrow= 01 ] ; then ; command 解释:每天中午检查一下明天是不是下个月的第一天,如果是则执行 command

cron job 必须指出脚本的全路径 15 10 * * * /home/rich/test4.sh > test4out

Building the cron table

显示当前用户的 cron job

1
2
crontab -l
# crontab: no crontab for i306454
View cron directories

mac 下没有这些配置,先跳过

Starting scripts with a new shell

书中说的是 set shell features, 所以下面这一段讲的是配置问题

启动 shell 时配置文件加载顺序如下,当前面的被发现时,后面的就会被忽略

  • $HOME/.bash_profile
  • $HOME/.bash_login
  • $HOME/.profile

这里用的是 runs the .bashrc 所以感觉 rc 文件更像是添加什么新功能的感觉(主管臆测,就我本人,感觉什么东西都塞到 rc 中了,也能 work)

每次 bash shell 启动时都会运行 .bashrc 中的内容