Amicus SendGrid, sed magis amica veritas

In total control

As Igor mentioned in the previous post, we have recently started working with the team of a popular freemium service, which is a project management system. This service generates quite a lot of emails, actually thousands a day. These are all kinds of notifications to users about the changes that took place and the daily due/overdue reminders for tasks. Users can also update the system by replying to a received email. As the number of users grows, we noticed that our SendGrid bills rise accordingly. For a freemium service, we naturally would like to minimize them. We also wanted to understand how effectively we send emails to make sure we do not send them nowhere, paying SendGrid for delivery.

We have found ourselves doing detective work more frequently, resolving mysteries of someone not getting notifications from our system or someone responding to a received email from us but not seeing their update in the system. Thus, the second problem was our inability to deal effectively with user problems related to email delivery. We had to constantly search and compare our own database and SendGrid’s reports and activity logs. Though it took a lot of time, it often did not give any results. After sending an email via smtp we have not had any connection between records in our database, and the result on the side of SendGrid.

A similar situation was with the incoming mail. For those who have not yet come across this I will describe how it works with SendGrid: emails are delivered to SendGrid, they are parsed and sent to you with a POST request. So, when we deal with a case that someone’s email update (our client’s reply) is not shown in our service, it is not clear who is to blame. Whether it is SendGrid not being able to parse, or our accepting script. There is a third option – SendGrid never received it. You will understand us, if you get into a situation where an honest answer – “Well, I do not know, damn it, why your emails did not arrive” cannot be sent as a reply to a user, and the only option is to promise to sort it out soon. To keep our promises, I got down to business.

We had the need for total control over the process of sending and receiving emails. The current process was – sending via SendGrid’s smtp and receiving via SendGrid, with no links whatsoever between our data and SendGrid’s. We also had thousands of emails to send and receive a day and their exponential growth, spilling into larger bills and lack of an effective mechanism to address users’ problems. The goal – get max data out of SendGrid to reduce costs and improve the quality and speed of customer support.

The first thing I did was I dug SendGrid’s documentation on this topic. It turned out that they have two hooks which are giving us the desired control on the emails statuses. The first hook is Events – SendGrid sends data to supplied URL on our side with events generated in their system. To identify the specific email they propose to use the unique arguments and categories that can be passed as a special header when sending mail via SMTP. Further, all updates on email processing will include these arguments and categories which allow one to accurately determine to what email that event relates to. What are the benefits? First, we always know for sure whether or not an email was delivered to a recipient. Second, you can fully or partially automate the process of re-sending emails that have the status of “Deferred” or were bounced and ended up in a Bounce list. Also, now we learn about an email being added into a Bounce list and can quickly take action. There are additional bonuses associated with the use of email categories.

As mentioned above, categories can be assigned to outbound emails, and according to the documentation an email can have up to ten of them attached. These categories are also used in SendGrid reports, where you can compare them with each other on pretty charts. Unfortunately, they implemented a flat category system. It would be more practical for us to have at least a two-level hierarchy. Two-levels because we could use application id, from where the email originated from, as a category, with sub-category as an action, which triggered it. For example, an application TODO sends emails when a user is creating, editing and changing a task. With sub-categories we could have a better idea of who generates emails inside a particular application and how much of them with a split by action. However, such actions as ‘add’, ‘edit’ and ‘delete’ do also exist in other applications, so ‘add’ category without ‘TODO’ category is not as useful for us in the statistics, as is with its binding to a specific application.

However, it is quite possible that this one level category approach will be useful for someone. Now, a few words about implementation. Our application server side is written with CodeIgniter and emails are sent with its standard library – Email. But, it does not allow creating custom headers when sending messages by SMTP, hence we needed to extend it with our own class. Taking as an example the library https://github.com/leonbarrett/CodeIgni … /Email.php we created our own library, chopping all of the parts that were not needed, and simplifying the rest by keeping just one method. We actually settled with two methods, because the _set_header() method declared in class CI_Email is private, which does not allow its use in inherited classes. Here is the complete code of the library:

    <?php  if ( ! defined('BASEPATH')) exit('No direct script access allowed');
    /**
     * CodeIgniter
     *
     * An open source application development framework for PHP 5.1.6 or newer
     *
     * @package      CodeIgniter
     * @author      Deep Shift Labs Dev Team
     * @copyright      Copyright (c) 2013, Deep Shift Labs
     * @license      http://codeigniter.com/user_guide/license.html
     * @link      http://codeigniter.com
     * @since      Version 1.0
     * @filesource
     */
 
    // ------------------------------------------------------------------------
 
    /**
     * CodeIgniter Email Class
     *
     * Permits email to be sent using Mail, Sendmail, or SMTP.
     *
     * @package   CodeIgniter
     * @subpackage   Libraries
     * @category   Libraries
     * @author   Deep Shift Labs Dev Team
     * @link   http://deepshiftlabs.com
     */
    class MY_Email extends CI_Email {
        /**
        * Add a Header Item
        *
        * @access  private
        * @param   string
        * @param   string
        * @return  void
        */
        private function _set_header($header, $value)
        {
            $this->_headers[$header] = $value;
        }
 
        /**
        * Add a Sendgrid header
        *
        * @access  public
        * @param   array
        * @return  void
        */
        public function addSendGridHeader($data) {
            $xsmtpapi = json_encode($data);
            // Add spaces so that the field can be foldd
            $xsmtpapi = preg_replace('/(["\]}])([,:])(["\[{])/', '$1$2 $3', $xsmtpapi);
 
            $this->_set_header('X-SMTPAPI', $xsmtpapi);
        }
    }

Thus adding categories to an emails id is done by calling addSendGridHeader() method, like this:

$ this-> email-> addSendGridHeader (array (‘unique_args’ => array (‘email_id’ => 1), ‘category’ => array (‘help’, ‘question’));

Let’s get back to hooks. The second hook, Inbound Parse, allows you to receive a response from users. After an appropriate adjustment of your domain’s or subdomain’s MX record, SendGrid starts sending a parsed incoming email to the specified email address. To our surprise, it turned out they do not have any events for incoming emails and we cannot get information on which stage of processing it is. We can only hope that this process never fails and is highly redundant. There is no way to find out if SendGrid really received an email. Taking this in account, we can only get what SendGrid exposes and bring to our database. We created following statuses for incoming emails: ‘received’, ‘hash_checked’, ‘duplicate’, ‘processed’. Wondering what is ‘hash_checked’ status for? I hope so, as I am going to tell you what it is now.

In addition to not getting events for incoming emails from SendGrid, we do not get any info at all from them, which would help to identify the email. Here is a typical workflow: We have an event, which generated an outbound email. Let’s say a new comment was added to a discussion. A user received an email notification, hit reply, typed his own comment and sent it. Now we have an incoming email and we need to figure out what discussion to update with a new comment from an inbound email. There are no unique arguments, no categories – nothing. No way to get a link between an outbound and inbound reply. It would be nice if SendGrid solved this problem. We would pass unique headers with sent email and get them back with a reply. This is also a part of a big and currently not taken care of transactional emails business Igor mentioned in his post.

We decided to use a special hash in “Reply-To” header, it looks like:

“db051171af45b683c50eb3d66017ecf2+s@incoming.servicedomain.com”

This hash is generated in a way that we always know what record in our system it corresponds to. This could be a task or discussion comment or something else. Upon getting an inbound response, we define the record it relates to, and if we were able to do so, we change the status to ‘hash_checked’. Next, we check whether the email is duplicate since we do not want to publish exactly the same content more than one time. Then a new record, containing the inbound email reply, is created in our system and the status is changed to ‘processed’.

One of the things we wanted to address was the possibility of re-sending emails that were not delivered to users. For this purpose all the data we need is saved into a database. This data includes a message body and headers, which contain subject, recipients, our unique parameters etc. Saving this data was easy, all it took was to change a few lines of existing code by overriding standard Email library send() method. Now we are saving absolutely everything. However, it makes little sense to keep data for messages that have been successfully delivered. Right? We have made minor changes to the script that receives events from SendGrid – if we get event notification ‘delivered’, we delete all the data we saved for this email. I say ‘all the data’ because this speeds up the process of saving messages in the database; we do not check whether this is the first attempt or not. With multiple attempts to deliver an email, the same email will be saved in a database several times. The logical conclusion was the addition of steps to re-send messages from a database. There were no problems with it, although it required a little tweak in the library I mentioned earlier. If you are interested, please ask in the comments and I’ll expand on it.

Here in general, and all about how we are starting to control outbound and inbound email delivery to our system. Do not hesitate to ask in the comments if something is unclear. We are planning to launch this new subsystem very soon and after collecting some data we will start optimisation – trying to send less emails. Next step will be to automate those cases where a problem is well understood and can be handled automatically. We will share this information with you. If you have already solved a similar problem then share your experience in the comments – that would be awesome!

Alex Savchenko

P.S. While releasing automatic system we described above, we faced an unpleasant situation. Our production and preproduction version of the system using different accounts in SendGrid to stop preproduction system affecting production server statistics. On 15th of September we launched this approach on production server and found that for some reason we cannot get the data from notification events. After looking into this, we found that on September 6th SendGrid launched new version of their Event hook, where they radically changed the format of data returned. In versions 1 and 2 they used POST variables, but in last version they started using JSON. Oddly enough, they did not warn their paying customers about such significant changes. Our Event Notification application on preproduction was activated before the 6th of September and, we tested for version 2 of the hook . Before the release on the 15th, we have activated notifications in our production account, and … were forced to use version 3 of the hook. Thus, we were forced to postpone our system upgrade to change and test notification parser.

Print this post | Home

Comments are closed.