Contents

在Linux中比较两个文件中的单词

1. 概述

在本教程中,我们将学习如何在 Linux 命令行上逐字比较两个文件。Linux 已经有一个命令 diff 可以比较两个文件。但是,它逐行比较它们,无法比较这些行中的单词。

在这里,我们将使用另一个命令wdiff 来显示两个文件之间的单词差异。

2. 使用wdiff

wdiff没有预装在 Linux 中,所以我们需要安装它:

$ sudo apt install wdiff

安装完成后,我们可能要确保它已安装:

$ wdiff --version
wdiff (GNU wdiff) 1.2.2
Copyright (C) 1992, 1997, 1998, 1999, 2009, 2010, 2011, 2012 Free Software
Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Franc,ois Pinard <[[email protected]](/cdn_cgi/l/email_protection)>.

现在假设我们有两个文本文件,first.txt 和 second.txt。我们可以使用cat 来输出它们的内容:

$ cat first.txt	
the quick brown fox jumps over the lazy dog
the woman snores and her husband wakes up
the sky is blue
$ cat second.txt
The quick yellow fox jumps over the sleeping dog
The man snores and his wife wakes up
It's raining

现在我们运行wdiff来查看差异:

$ wdiff first.txt second.txt
[-the-]{+The+} quick [-brown-] {+yellow+} fox jumps over the [-lazy-] {+sleeping+} dog
[-the woman-]
{+The man+} snores and [-her husband-] {+his wife+} wakes up
[-the sky is blue-]
{+It's raining+}

** wdiff告诉我们需要在第一个文件中更改哪些词,以便它匹配第二个文件**。方括号’[-word-]‘指的是我们需要从第一个文件中删除的词,大括号’{++}‘指的是我们需要添加到第一个文件中的词。

3. 过滤输出

如果我们添加选项*-1*,输出将不包含我们需要从第一个文件中删除的单词:

$ wdiff -1 first.txt second.txt
{+The+} quick {+yellow+} fox jumps over the {+sleeping+} dog
{+The man+} snores and {+his wife+} wakes up
{+It's raining+}

请注意,输出现在不包含方括号。 如果我们添加*-2*,输出将不包含 我们需要添加到第一个文件的单词:

$ wdiff -2 first.txt second.txt
[-the-] quick [-brown-] fox jumps over the [-lazy-] dog
[-the woman-] snores and [-her husband-] wakes up
[-the sky is blue-]

最后,如果我们添加 -3,输出将不包括两个文件之间共有的单词:

$ wdiff -3 first.txt second.txt
======================================================================
[-the-]{+The+}
======================================================================
 [-brown-] {+yellow+}
======================================================================
 [-lazy-] {+sleeping+}
======================================================================
[-the woman-]
{+The man+}
======================================================================
 [-her husband-] {+his wife+}
======================================================================
[-the sky is blue-]
{+It's raining+}
======================================================================

4.忽略大小写

** wdiff默认区分大小写**。为了使其不区分大小写,我们可以添加选项*–ignore-case*:

$ wdiff --ignore-case first.txt second.txt
The quick [-brown-] {+yellow+} fox jumps over the [-lazy-] {+sleeping+} dog
The [-woman-] {+man+} snores and [-her husband-] {+his wife+} wakes up
[-the sky is blue-]
{+It's raining+}

我们可以看到输出不包含具有不同大写字母的相同单词。

5. 着色输出

阅读wdiff的输出对于普通用户来说可能很困难。但是,有一个包 colordiff 可以为wdiff的输出着色并使其更易于理解。

Linux默认没有colordiff 。因此,我们需要安装它:

$ sudo apt install colordiff

安装完成后,我们可以将wdiff的输出通过管道传输到 colordiff

/uploads/compare_words_two_files/1.png 以红色打印的单词是我们需要从第一个文件中删除的单词,以绿色打印的单词是我们需要添加到第一个文件中的单词。