I am trying to grep with regex that contains pipe character |. However, It doesn't work as expected. The regex does not match the | inclusively as seen in the attach image below.
this is my bash command
cat data | grep "{{flag\|[a-z|A-Z\s]+}}"
the sample data are the following
| 155||NA||{{flag|Central African Republic}}||2.693||NA||0.000||0.000||0.019||0.271||0.281||0.057||2.066
|{{flagicon|Kosovo}} ''[[Kosovo]]'' {{Kosovo-note}}
|{{flagicon|Somaliland}} [[Somaliland|Somaliland region]]
|{{flagicon|Palestine}} ''[[Palestinian Territories]]''{{refn|See the following on statehood criteria:
the expected output is
| 155||NA||{{flag|Central African Republic}}||2.693||NA||0.000||0.000||0.019||0.271||0.281||0.057||2.066
However, having tested it with Regex101.com, the result came out as expected.
Answer
It appears that grep accepts \| as a separator between alternative search expressions (like | in egrep, where \| matches a literal |).
Apart from that, your expression has other problems:-
+is supported inegrep(orgrep -E) only.\sis not supported within a[]character group.- I don't see the need for
|in the character group.
So the following works for grep:-
grep "{{flag|[a-zA-Z ][a-zA-Z ]*}}" Or (thanks to Glenn Jackman's input):-
grep "{{flag|[a-zA-Z ]\+}}" In egrep the {} characters have special significance, so they need to be escaped:-
egrep "\{\{flag\|[a-zA-Z ]+\}\}" Note that I have removed the unnecessary use of cat.

No comments:
Post a Comment