删除文件中从特定行开始的所有行

codingman included in Linux

2016-08-28 3384 words 7 minutes

Contents

1. 概述

当我们在 Linux 命令行下工作时，我们经常需要对文本文件进行操作。

从文本文件中删除行是一种常见的操作——例如，删除文件的第一行，从另一个文件 B中删除文件 A 中出现的行，从文件中删除最后 N 行，等等。

在本教程中，我们将了解如何从给定的行号删除行，直到文件末尾。

2. 问题介绍

虽然这个问题并不难理解，但让我们看一个例子来让我们明白这一点。

假设我们有一个名为input.txt的输入文件：

$ nl input.txt
     1	I am the 1st line.
     2	I am the 2nd line.
     3	I am the 3rd line.
     4	I am the 4th line.
     5	I am the 5th line.
     6	I am the 6th line.
     7	I am the 7th line.
     8	I am the 8th line.

如上面的输出所示，我们使用 nl 命令打印带有行号的文件内容。

现在，假设我们的目标是删除从第 5 行到文件末尾的所有行。

该问题可能有两种变体：

删除所有行，从给定的行号开始，直到文件的末尾（给定的行不会出现在结果中）。
删除给定行号之后的所有行，直到文件末尾（给定行将在结果中）。

我们将在本教程中讨论这两种情况。

在 Linux 命令行中有多种方法可以做到这一点。在本教程中，我们将探讨四种方法：

使用纯 Bash
使用head 命令
使用 sed 命令
使用 awk 命令

接下来，让我们看看他们的行动。

3. 使用纯 Bash

Bash 是大多数现代 Linux 发行版的默认 shell。所以，如果我们用纯 Bash 解决一个问题，也就是说，我们的解决方案不依赖任何额外的依赖。

接下来，让我们看看如何使用简单的 Bash 脚本来解决问题。

3.1. 删除给定行之后的所有行

首先，让我们看一个 shell 脚本rmLines_v1.sh来删除不包括给定行的行。也就是说，给定的行将保留在我们的结果中：

$ cat rmLines_v1.sh
#!/bin/bash
FILE="$1"
LINE_NO=$2
i=1
while read line; do
    echo "$line"
    if [ "$i" -eq "$LINE_NO" ]; then
        break
    fi
    i=$(( i + 1 ))
done <"$FILE"

shell 脚本看起来很简单。它接受两个参数：文件名和行号。

脚本的主要部分是一个 while 循环，它遍历并输出行，直到给定的行号。我们声明一个计数器变量*$i*并在循环中递增计数器，以便我们知道何时应该停止打印。

让我们使用示例输入文件执行脚本：

$ ./rmLines_v1.sh input.txt 5
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.
I am the 5th line.

在这个测试中，我们传递了一个“5”作为第二个参数，我们看到第五行在输出中。所以，它按我们的预期工作。请注意，尽管脚本会打印所需的输出，但它不会更改原始input.txt文件。如果我们想将更改写回输入文件，shell 的重定向可以帮助我们：

$ ./rmLines_v1.sh input.txt 5 > tmp.result && mv tmp.result input.txt 
$ cat input.txt
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.
I am the 5th line.

接下来，让我们修改脚本 rmLines_v1.sh以删除行，包括给定的行。

3.2. 删除给定的行和它之后的所有行

要求很容易理解。例如，如果我们将第 5 行作为参数，则也应该删除第 5 行。

这对我们来说不是挑战。我们可以修改rmLines_v1.sh脚本来解决问题。有两种方法可以使它工作：

在if块之后移动*echo “$line”*行，以便我们可以在打印给定行之前中断循环
将if条件*[ “$i” -eq “$LINE_NO” ]更改为[ “$i” -eq $(( “LINE_NO” -1 ))]*

我们可以创建两个脚本来分别处理包含和排除场景。但是，如果有新的需求出现，我们必须维护两个脚本。

如果我们可以创建一个适用于这两种情况的脚本，那就太好了。

接下来，让我们看看如何实现目标。

3.3. 一个脚本覆盖两个场景

首先，让我们看一下脚本的第二个版本：

$ cat rmLines_v2.sh
#!/bin/bash
err_usage(){
    echo "The Arguments are not accepted!"
    echo "Usage: $0 <-i or -e> <FILENAME> <FROM_LINE_NUMBER>"
    echo "-i : Remove lines including the given line."
    echo "-e : Remove lines excluding the given line."
    exit 1
}
if [ $# -ne 3 ]; then
    err_usage
fi
FILE="$2"
LINE_NO=$3
case "$1" in
    -i)
	LINE_NO=$(( LINE_NO - 1 ))
        ;;
    -e)
        ;;
    *)
        err_usage
        ;;
esac
i=1
while read line; do
    echo "$line"
    if [ "$i" -eq "$LINE_NO" ]; then
         break
    fi
    i=$(( i + 1 ))
done <"$FILE"

如上面的输出所示，我们在脚本中引入了一个新参数：-i或*-e来指定行删除语义，分别包括或排除给定的行。除此之外，我们还在脚本中添加了一个新的if检查和一个新的case块。新添加的if*块验证用户是否已将三个参数传递给脚本。

case语句检查第一个选项是“ -i ”还是“ -e ”。此外，如果用户通过-i选项，它将从*$LINE_NO*变量中减少一个。

我们在脚本中显示的参数解析只是一个示例。我们将在“如何在 Bash 脚本中使用命令行参数 ”中更详细地讨论更高级的参数解析。

最后，让我们使用*-i*和 -e选项测试脚本的第 2 版，看看它是否解决了我们的问题：

$ ./rmLines_v2.sh -i input.txt 5
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.
$ ./rmLines_v2.sh -e input.txt 5
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.
I am the 5th line.

伟大的！如输出所示，我们已经使用单个脚本实现了对这两种场景的覆盖。因此，我们使用纯 Bash 解决了这个问题。

4. 使用 head命令

尽管纯 Bash 解决方案不需要任何外部包，但我们必须自己实现每个步骤，例如循环。

Linux 命令行库中有许多广泛使用的文本处理工具。通常，他们可以以一种非常紧凑和直接的方式解决我们的问题。

head命令可能是解决问题最明显的方法。这是因为head命令将输出文件的第一部分。这正是我们正在寻找的。

4.1. 删除给定行之后的所有行

我们可以使用 head的*-n X选项来获取前X*行：

$ head -n 5 input.txt 
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.
I am the 5th line.

我们可能希望将上面head命令中的硬编码“ 5 ” 替换为 shell 变量，以使该命令易于在脚本中组装：

$ head -n "$LINE_NO" input.txt

4.2. 删除给定的行和它之后的所有行

head命令本身不支持包含和不包含给定行的输出。但是，我们可以调整“ -n X ”选项中的 X 来实现：

$ LINE_NO=5
$ head -n "$(( LINE_NO - 1 ))" input.txt
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.

将head命令包装在类似于我们在纯 Bash 解决方案中编写的 shell 脚本中并不难。

我们可以让 shell 脚本处理参数，例如“包含”或“排除”选项，并更改head命令中使用的*$LINE_NO*变量。

所以，我们也可以用head命令实现一个脚本覆盖两个场景。

5. 使用sed命令

到目前为止，我们已经使用纯 Bash 和方便的head命令解决了这个问题。

现在，让我们看看sed是如何解决这个问题的。

5.1.删除给定行之后的所有行

sed可以通过不同的方式解决问题。例如，所有这些命令都会完成这项工作（假设我们已经将值 5 存储在*$LINE_NO* shell 变量中）：

sed -n "1,$LINE_NO p;$(( LINE_NO + 1 )) q" input.txt
sed "$(( LINE_NO + 1 )),$ d" input.txt
sed "1, $LINE_NO ! d" input.txt

请注意，我们使用双引号将上述所有sed命令中的命令括起来以扩展 shell 变量。

此外，我们在“ ！”和最后一个命令中的“ d ”命令。这是因为如果我们将“ !d ”放在双引号之间，则 ’ ! ’ 字符将默认触发 Bash 的历史扩展 。

这三个命令将产生相同的输出。但是，如果我们考虑性能，第一个命令将比其他两个命令具有更好的性能，尽管它看起来比两个都长。

这是因为第一个命令只读取到LINE_NO+1行。之后， sed命令将退出（q）处理输入文件。这在我们处理大型输入文件时特别有用。

现在，我们以第一个命令为例进行测试：

$ echo $LINE_NO
5
$ sed -n "1,$LINE_NO p; $(( LINE_NO + 1 )) q" input.txt
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.
I am the 5th line.

值得一提的是sed有一个 -i 选项，允许我们将更改写回文件。

5.2. 删除给定的行和它之后的所有行

更改我们的 sed命令以使其适用于“包含删除”场景并不难。我们可以在*$LINE_NO*变量上玩一些数学技巧来实现这一点：

$ sed -n "1,$(( LINE_NO - 1 )) p; $LINE_NO q" input.txt
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.

我们已经使用sed命令解决了这个问题。

如果我们想让一个sed命令适用于两种场景，我们可以将 sed命令包装在一个类似于纯 Bash 解决方案的 shell 脚本中。

6. 使用awk 命令

awk是另一个强大的文本处理实用程序。现在，让我们看看awk是如何解决这个问题的。

6.1. 删除给定行之后的所有行

与 sed类似， awk也可以以非常紧凑的方式解决问题：

awk 'NR <= 5' input.txt

但是，**我们希望使命令更加灵活。因此，我们将硬编码的“ 5 ”提取到awk变量“ lineNo ”**中。此外，为了获得更好的性能，我们在当前处理行号大于 from 的值时退出文件处理：

$ awk -v lineNo='5' 'NR > lineNo{exit};1' input.txt 
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.
I am the 5th line.

6.2. 删除给定的行和它之后的所有行

有几种方法可以将 awk命令更改为“包含删除”。一种直接的方法是将变量from替换为*(from -1)*：

$ awk -v lineNo='5' 'NR > (lineNo-1){exit};1' input.txt
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.

我们还可以构建一个命令来涵盖“inclusive-deletion”和“exclusive-deletion”。使用 awk更容易，因为它支持变量和 awk脚本语言：

$ awk -v opt="i" -v lineNo="5" 'NR > lineNo-( opt == "i"? 1 : 0 ){exit};1' input.txt 
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.
$ awk -v opt="e" -v lineNo="5" 'NR > lineNo-( opt == "i"? 1 : 0 ){exit};1' input.txt 
I am the 1st line.
I am the 2nd line.
I am the 3rd line.
I am the 4th line.
I am the 5th line.

当然，如果我们愿意，我们也可以将awk命令包装在一个小的 shell 脚本中，并从 shell 变量中传递“ opt ”和“ lineNo ”，就像我们对sed命令所做的那样。