Fasta to 2-line fasta
Level: Intermediate (score: 3)
A very simple format to store biological sequence data is the (multi-)FASTA format.
The first line of each record starts with a > character and is followed by a name. The following lines contain the sequence information. A record ends when > character or the end of the file is encountered.
FASTA files downloaded from public databases such as the National Center for Biotechnology Information (NCBI) often contain line breaks after 60-80 characters which ensures sequences are not truncated in text editors.
However in many cases (think *nix command line tools, grep
, wc
, etc.), it is better if each sequence is exactly one line long.
Your job is to convert a multiline FASTA file to a 2-Line FASTA file.
Multiline FASTA format:
>Sequence 1:
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGA
AATTGCTCAAGAAAAATTATCAGCTGTAAGTTACT
[...]
>Sequence 2:
ATGATGGAATTCACTATTAAAAGAGATTATTTTAT
TACACAATTAAATGACACATTAAAAGCTATTTCAC
[...]
2-Line FASTA format:
>Sequence 1:
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTGTAAGTTACT[...]
>Sequence 2:
ATGATGGAATTCACTATTAAAAGAGATTATTTTATTACACAATTAAATGACACATTAAAAGCTATTTCAC[...]
This Bite has biopython enabled (check out module Bio.SeqIO
's convert
function), but it can also be solved without this module.