Pybites Logo

Parse an email header

Level: Advanced (score: 4)

Write a regular expression to extract 4 pieces of information from an email header:

  • From email
  • To email
  • Subject
  • Date sent (without timezone info)

Use re.match or re.search and capturing parenthesis. Return the captured groupdict of the match object.

Here is an example how it would work (email header found here - we use another made up one in the tests):

>>> header = """Return-Path: <bounces+5555-7602-redacted-info>
... ...
... Received: by 10.8.49.86 with SMTP id mf9.22328.51C1E5CDF
...     Wed, 19 Jun 2013 17:09:33 +0000 (UTC)
... Received: from NzI3MDQ (174.37.77.208-static.reverse.softlayer.com [174.37.77.208])
... by mi22.sendgrid.net (SG) with HTTP id 13f5d69ac61.41fe.2cc1d0b
... for ; Wed, 19 Jun 2013 12:09:33 -0500 (CST)
... Content-Type: multipart/alternative;
... boundary="===============8730907547464832727=="
... MIME-Version: 1.0
... From: redacted-address
... To: redacted-address
... Subject: A Test From SendGrid
... Message-ID: <1371661773.974270694268263@mf9.sendgrid.net>
... Date: Wed, 19 Jun 2013 17:09:33 +0000 (UTC)
... X-SG-EID: P3IPuU2e1Ijn5xEegYUQ...
... X-SendGrid-Contentd-ID: {"test_id":"1371661776"}"""
>>> 
>>> from email_header import get_email_details
>>> get_email_details(header)
{'from': 'redacted-address', 'to': 'redacted-address', 'subject': 'A Test From SendGrid',
 'date': 'Wed, 19 Jun 2013 17:09:33'}

Enjoy and keep calm and code in Python!