posix shell数组/列表数据结构

2017-06-08 1728 words 4 minutes

Contents

1. 概述

在本文中，我们将介绍 Linux shell 中数组/列表的实现及其怪癖。

2. Bash 与 POSIX Shell 数组

与基本的 POSIX 兼容 shell 相比，bash 数组功能更强大，使用更方便。让我们通过尝试创建一个索引数组并访问它的第三个成员来说明这一点。

在 bash 中，我们使用*(…)*语法声明数组：

$ array=(item1 item2 item3)
$ echo "${array[2]}"
item3

在 POSIX shell 中，我们用set声明数组：

$ set -- item1 item2 item3
$ echo "$3" # Arrays indices start from 1
item3

我们可以看到 bash 对数组有更简洁的语法，使其更容易用于更复杂的操作。

3. 创建数组

我们对 POSIX shell 中的数组没有一流的支持。但是，我们可以将位置参数列表用作数组。

位置参数是传递给 shell 脚本/函数的所有参数。例如，在my_function 1 2 3中， my_function后面的数字是位置参数。

我们可以使用内置的set修改数组并通过$@变量访问它们，该变量表示所有位置参数，即我们的数组：

$ set -- 1 2 3
$ echo $@
1 2 3

我们用双破折号标记选项的结束。

4. 基本数组操作

现在，让我们看看我们可以执行的各种数组操作。

4.1. 添加项目

为了添加项目，我们只需将原始数组与新项目一起传递给set：

$ set -- 1 2 3
$ echo "$@" # Original array
1 2 3
$ set -- "$@" 4
$ echo "$@" # New array
1 2 3 4

这里*“ $@ ”*扩展为我们所有的原始位置参数，即数组的项目。

4.2. 移除项目

我们可以轻松地从数组的开头删除多个项目，但任意删除是很棘手的。

对于删除元素，我们使用内置的 shift ，将要删除的项目数传递给它：

$ set -- 1 2 3
$ shift 2 # Remove first 2 items
$ echo "$@"
3

没有直接的方法可以删除给定索引处的项目，所以让我们为其编写一个函数：

# Argument 1: The index to remove
# Argument 2: The array
# Usage: set -- "$(array_remove N "$@")"
array_remove() {
    index="$
    shift # Remove the index from argument list
    counter=1 # Array indexing starts from 1
    # Print elements upto the index, "-lt" means less than.
    while [ "$counter" -lt "$index" ]; do
        : $((counter+=1)) # Increment counter
        echo "$1" # First item of current array
        shift # Move to the next item
    done
    # Skip the element at the removal index, we've printed everything before it.
    shift
    # Print the rest of the array.
    echo "$@"
}

让我们测试一下：

$ set -- 1 2 3 4 5
$ set -- "$(array_remove 4 "$@")" # Remove at 4th index
$ echo "$@"
1 2 3 5

请注意，此方法效率低下，因为我们遍历整个数组直到给定索引。

4.3. 索引

我们可以使用变量${N}索引数组，其中N*是所需的索引：*

$ set -- 3 2 1
$ echo "${3}"
1

**我们必须将数字括在花括号中以允许索引长度大于一位。**例如，shell 可能会将“ $98 ”评估为附加到“ $9 ”变量值的字符串“ 8 ”。“ ${99} ” 阻止了这种行为。

但是，如果我们将索引存储在环境变量中，我们需要求助于eval ：

$ set -- one two three
$ index=3
$ eval "echo \${${index}}"
three

为了避免eval，我们可以创建一个函数来接收数组，跳过N个元素，然后打印第一个元素：

# Argument 1: The index
# Argument 2: The array
array_index() {
    shift "$1" # Shift N number of elements, including the first argument
    # Return non-zero if index is out of bounds ($1 will be empty)
    echo "${1:?Index out of bounds}" # Print the first item after shifting
}

让我们运行它：

$ set -- 0 1 2 3 4 5 6 7 8 9 10
$ array_index 12 "$@"
/bin/sh: 1: Index out of bounds
$ array_index 11 "$@"
10

4.4. 迭代

我们使用for来遍历一个数组：

$ set -- 1 2 3
$ for item in "$@"; do echo "$item"; done
1
2
3

“ $@ ”在这里是可选的。我们可以用or item; do …; done遍历位置参数；在没有“ $@ ”的情况下完成。

4.5. 从命令生成数组

我们还可以将 subshell 命令作为参数传递给set以生成数组。假设我们想要一个包含 100 个整数的数组：

$ set -- $(seq 100)
$ echo "$@"
1 2 3 4 5 6 7 8 9 10 11 12 ...

seq 命令生成给定范围内的数字。

5. 关联数组/哈希映射

如果我们需要在 shell 中使用哈希映射，我们应该考虑使用更强大的语言，例如 Python。

虽然仍然可以使用文件实现它们并通过函数与它们进行交互，但嵌套键等更复杂的操作无法干净地实现。

此外，获取或创建新密钥也会有更多的延迟，因为系统每次都需要创建新的文件描述符来读取数据。

5.1. 执行

对于实现，我们只使用文件名作为散列键并将它们的内容作为值。

我们还采用密钥的校验和，而不是仅使用密钥字符串作为文件名。这使我们不仅可以绕过文件名长度限制，还可以避免名称中出现多余的斜线。例如，创建一个名为“ filewith/slash ”的文件将是无效的，因为斜杠分隔了目录。

哈希表目录本身是使用mktemp 创建的：

hm_create() {
    # Create a temporary directory and return it's name
    mktemp -d
}
# Lazy hash function that just generates a checksum.
# Feel free to replace this with a more secure checksum like sha256.
hm_hash() {
    echo "$1" | md5sum -
}
# Argument 1: Hash Table
# Argument 2: Key
# Argument 3: Value
hm_put() {
    echo "$3" > "$1/$(hm_hash "$2")"
}
# Argument 1: Hash Table
# Argument 2: Key
hm_delete() {
    rm -f "$1/$(hm_hash "$2")"
}
# Argument 1: Hash Table
# Argument 2: Key
hm_get() {
    cat "$1/$(hm_hash "$2")"
}

5.2. 用法

让我们用几个键创建一个哈希表并打印它们：

$ hm="$(hm_create)"
$ echo "Created hashmap "$hm""
Created hashmap /tmp/tmp.K6Kuuv
$ hm_put "$hm" mykey myvalue
$ hm_put "$hm" hash table
$ hm_get "$hm" hash
table
$ hm_get "$hm" mykey
myvalue
$ hm_delete "$hm" hash
$ hm_get "$hm" hash # Deleted key "hash" doesn't exist, will raise an error.
cat: can't open '/tmp/tmp.K6Kuuv/4e76434eea3c9d9cf9cb10bbf3f4a74b  -': No such file or directory

然后，我们可以在*$hm目录上使用简单的rm -rf*删除整个哈希表。