Pybites Platform

Back

Fasta to 2-line fasta

schustercf

Level: Intermediate (score: 3)

A very simple format to store biological sequence data is the (multi-)FASTA format.

The first line of each record starts with a > character and is followed by a name. The following lines contain the sequence information. A record ends when > character or the end of the file is encountered.

FASTA files downloaded from public databases such as the National Center for Biotechnology Information (NCBI) often contain line breaks after 60-80 characters which ensures sequences are not truncated in text editors.

However in many cases (think *nix command line tools, grep, wc, etc.), it is better if each sequence is exactly one line long.

Your job is to convert a multiline FASTA file to a 2-Line FASTA file.

Multiline FASTA format:

>Sequence 1:
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGA
AATTGCTCAAGAAAAATTATCAGCTGTAAGTTACT
[...]
>Sequence 2:
ATGATGGAATTCACTATTAAAAGAGATTATTTTAT
TACACAATTAAATGACACATTAAAAGCTATTTCAC
[...]

2-Line FASTA format:

>Sequence 1:
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTGTAAGTTACT[...]
>Sequence 2:
ATGATGGAATTCACTATTAAAAGAGATTATTTTATTACACAATTAAATGACACATTAAAAGCTATTTCAC[...]

This Bite has biopython enabled (check out module Bio.SeqIO's convert function), but it can also be solved without this module.

biopython bioinformatics

Run Tests	Cmd/Ctrl + Enter
Run Code	Shift + Enter
Lint	Alt + Enter
Switch Code / Tests / Solution Tabs	Cmd/Ctrl + 1/2/3

Focus Navbar Search Form	Cmd/Ctrl + K
Show this Shortcuts Modal	Cmd/Ctrl + Shift + K

Pybites Platform

Fasta to 2-line fasta

Multiline FASTA format:

2-Line FASTA format:

Keyboard Shortcuts

Bite Page Shortcuts

Global Shortcuts