I am trying to grep with regex that contains pipe character |
. However, It doesn't work as expected. The regex does not match the |
inclusively as seen in the attach image below.
this is my bash command
cat data | grep "{{flag\|[a-z|A-Z\s]+}}"
the sample data are the following
| 155||NA||{{flag|Central African Republic}}||2.693||NA||0.000||0.000||0.019||0.271||0.281||0.057||2.066
|{{flagicon|Kosovo}} ''[[Kosovo]]'' {{Kosovo-note}}
|{{flagicon|Somaliland}} [[Somaliland|Somaliland region]]
|{{flagicon|Palestine}} ''[[Palestinian Territories]]''{{refn|See the following on statehood criteria:
the expected output is
| 155||NA||{{flag|Central African Republic}}||2.693||NA||0.000||0.000||0.019||0.271||0.281||0.057||2.066
However, having tested it with Regex101.com, the result came out as expected.
Answer
It appears that grep
accepts \|
as a separator between alternative search expressions (like |
in egrep
, where \|
matches a literal |
).
Apart from that, your expression has other problems:-
+
is supported inegrep
(orgrep -E
) only.\s
is not supported within a[]
character group.- I don't see the need for
|
in the character group.
So the following works for grep
:-
grep "{{flag|[a-zA-Z ][a-zA-Z ]*}}"
Or (thanks to Glenn Jackman's input):-
grep "{{flag|[a-zA-Z ]\+}}"
In egrep
the {}
characters have special significance, so they need to be escaped:-
egrep "\{\{flag\|[a-zA-Z ]+\}\}"
Note that I have removed the unnecessary use of cat
.
No comments:
Post a Comment