mg の新機能


変更したのはもう去年のことだが、今使っている mg には、これまでとまったく違う検索機能が追加されている。


mg に限らず grep 系のコマンドでは、検索エンジンのような指定がしにくい。たとえば foo と bar の両方が含まれている行を検索しようとすると案外面倒なのである。普通はパイプを使ってこんな風にするのだろうか。


% grep foo file | grep bar


mg で1回でこれを実行するとすれば、


% mg -e 'foo.*bar|bar.*foo'
という具合か。でも、文字列の数が増えてくるとパターンが極めて複雑になってしまう。


先読みを使うとこんな風にも書ける。


% mg -e '(?=.*foo)(?=.*bar)'
これはこれで、なかなかご機嫌な使い方だ。
さらに buz を含まない行を探したければこう書けばいい。

% mg -e '(?=.*foo)(?=.*bar)(?!.*buz)'


というわけで、それをもっと簡単に指定できる機能を実現してみたのである。
最初は必要があって追加した機能だが、その後なかなか使う機会がなくて、リリースに至らずにいる。


EXTENDED PATTERN SEARCH
Mg now supports completely new search method. This feature is invoked
by -x flag, and then the specified pattern is interpreted in the dif-
ferent manner.

Pattern string is treated as a colleciton of tokens separated by white
space. Each component will be searched independently, but only the
line which contains all of them will be printed. For examle,

mg -xp 'foo bar buz' ...

will print lines which contain all of `foo', `bar' and `buz'. They can
be found in any order and/or any place in the string. So this command
find all of following texts.

foo bar buz
buz bar foo
the foo, bar and buz

If you want to use OR syntax, prepend question (`?') sign on each
token, or use regular expression in pattern:

mg -xp 'foo bar buz ?yabba ?dabba ?doo'
mg -exp 'foo bar buz yabba|dabba|doo'

This command will print the line which contains all of `foo', `bar' and
`buz' and one or more from `yabba', `dabba' or `doo'. Note that you
need to use -e option to enable regular expression interpretation.

Please be aware that multiple `?' preceded tokens are treated all mixed
together. That means `?A|B ?C|D' is equivalent to `?A|B|C|D'. If you
want to mean `(A or B) and (C or D)', use AND syntax instead of OR:
`A|B C|D'.

NOT operator can be specified by prefixing the token by minus (`-')
sign. Next example will show the line which contain both `foo' and
`bar' but none of `yabba' or `dabba' or `doo'.

mg -xp 'foo bar -yabba -dabba -doo'
mg -exp 'foo bar -yabba|dabba|doo'

It is ok to set plus (`+') sign before positive AND token, but it has
no effect.

When executed with -o option, the paragraph which contains all these
components will be found. This style is much more usefull, actually.

You can't use double quote to include white space within each token.
Separate options are prepared to spcify each component individually;
--and, --or and --not. These options can be used multiple times.

mg --and 'foo bar' --and buz --or 'yabba dabba' --or doo ...

Long option --xp is equivalent to the combination of -x and (optional)
-p options. Author decided to override single character option -x, to
make it posssible using in this way:

mg -oeiQxp 'foo bar buz' ...