Pybites Logo

Translate coding sequences to proteins

Level: Intermediate (score: 3)

Genes can be converted (translated) to proteins using a three base decoding system as described in Bite 255.

Your job is to create a function that takes a coding sequence (CDS) (=the region of a gene that encodes a protein but excluding any other features) and returns the translated protein/polypeptide as a str.

For your convenience, we've installed Biopython for you. Have a look at the Bio.Seq Module and locate the correct function to solve this bite. You will also need to ensure proper cleaning up of input whitespace. 

Although the genetic code is very conserved, there are differences between encodings in certain organisms and cell compartments. Therefore, there are different mappings or "translation tables" implemented in Biopython that you can directly use.

Also make sure you find and set the option to use complete coding sequences (CDS) as this means that start codons in position 1 are evaluated as the amino acid M instead of their usual counterpart.

Example of how the function should work:

>>> translate_cds("ATGGGGTTTTAA", "Bacterial")
'MGF'
>>> translate_cds("TTGGGGTTTTAA", "Bacterial")
'MGF'
>>> translate_cds("ACGGGGTTTTAA", "Bacterial")
[...]
TranslationError: First codon 'CTG' is not a start codon

Good luck!