Pybites Logo

Unique genes

Level: Advanced (score: 4)

You have received a list of DNA sequences for a specific gene in the FASTA file format (see Bite 298).

Your job is to collapse this FASTA file to a new FASTA file which contains only unique gene sequences and the headers for each entry are changed accordingly.

To make things more interesting the function not only accepts FASTA files but also gzipped FASTA files.

Watch out for edge cases as specified in the tests.

Example

convert_to_unique_genes("input.fasta", "output.fasta")

input.fasta

>gene [locus_tag=AA11]
AAAAAA
>gene [locus_tag=BB22]
AAAAAA
>gene [locus_tag=CC33]
GAAAAC

output.fasta

>gene [locus_tags=AA11,BB22]
AAAAAA
>gene [locus_tag=CC33]
GAAAAC