How to copy multiple-line-regex outputs into clipboard using Notepad++

Saturday, September 2, 2017

How to copy multiple-line-regex outputs into clipboard using Notepad++

I have a fasta file containing genome sequences of multiple viruses.

Example:

>gi_138375030_Human_papillomavirus
GAAAGTTTCAATCATACTTTATTATATTGGGAGTAAAAAAAA...


>gi_94481944_Human_herpesvirus_3
GGCCCAGCCCTCTCGCGGCCCCCTCGAGAGAGAAAAAAA...

I want to extract only herpes virus entries, including the actual sequence, which is (in this file) always the line folowing the description.

The folowing regex works:

>.*herpes.*\n.*\n

It selects the description and the sequence lines.

I have found similar questions but all make use of the "bookmark line" function:
Export all regular expression matches in Textpad or Notepad++ as a list

However, this only bookmarks the first line of the regex output, so I am unable to use the described solutions. If I use "find all in current document", it also only lists the first lines.

All I want to do is copy the output of regex into a new file. It is especially frustrating since it finds just above a hundred entries, which is just above the margin under which I would be willing to do it manually.

I would prefer a solution in Windows OS.

Answer

You could make a copy of the file and then, on the copy, search and replace the negation of what you want:

(?!>.*herpes.*)^(>.*\R)([ATGC]+\R)

The above will (or ought to) find paired lines that do not have herpes. Couple this with a blank replace field, you will wind up with a file that has only what you are looking for.

Blog

Saturday, September 2, 2017

How to copy multiple-line-regex outputs into clipboard using Notepad++

No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?