  {"id":2332,"date":"2015-11-21T18:15:07","date_gmt":"2015-11-21T23:15:07","guid":{"rendered":"https:\/\/digital.hbs.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/"},"modified":"2015-11-21T18:15:07","modified_gmt":"2015-11-21T23:15:07","slug":"gmail-ensuring-a-spam-free-inbox-with-machine-learning","status":"publish","type":"hck-submission","link":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/","title":{"rendered":"Gmail: ensuring a spam-free inbox with Machine Learning"},"content":{"rendered":"<p>Around 90% of emails sent around the world are spam [1]. Luckily, Google has spent significant time and money on Gmail\u2019s spam filters, making them very effective at making sure the vast majority of these spam emails don\u2019t make it to our inboxes.<\/p>\n<p>&nbsp;<\/p>\n<p>How do they do it? By constantly refining their spam filter algorithms.<\/p>\n<p>&nbsp;<\/p>\n<p>Gmail looks at various inputs for each email (see Exhibit): who is the sender (and what we already know about them), what\u2019s their IP address, what is the content etc., and tries to classify it as either \u201cspam\u201d or \u201cnot spam\u201d. If an email is deemed \u201cnot spam\u201d, it shows up user\u2019s inbox \u2013 otherwise it\u2019s sent directly to the \u201cspam\u201d folder. But then, importantly, Gmail also monitors users\u2019 behaviors to refine its filter\u2019s decision-making process. By monitoring users\u2019 engagement with the email, it can close the feedback loop and improve its predictive power. If, for example, emails with the words \u201cMagic pill\u201d and \u201cweight loss\u201d tend to be flagged by users as \u201cspam\u201d, or are just deleted as soon as they are received, the Gmail team might decide to penalize emails with those words, increasing the likelihood of blocking them. In fact, in all likelihood, the Gmail team doesn\u2019t discuss such individual decisions, but rather on rules to apply so that the algorithm picks up on these patterns on its own. Ever pushing to improve its filter, this year Gmail announced that it will expand the list of things its filter looks at to include each recipient\u2019s taste, as inferred for their Gmail activity history (e.g., what they\u2019re generally interested in) [2]. Oh, and if this smells like direct network effects to you, you\u2019re not mistaken.<\/p>\n<p><a href=\"https:\/\/d3.harvard.edu\/platform-digit\/wp-content\/uploads\/sites\/2\/2015\/11\/gmail-spam-filter.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-2331 aligncenter\" src=\"https:\/\/d3.harvard.edu\/platform-digit\/wp-content\/uploads\/sites\/2\/2015\/11\/gmail-spam-filter.png\" alt=\"gmail spam filter\" width=\"756\" height=\"396\" srcset=\"https:\/\/d3.harvard.edu\/platform-digit\/wp-content\/uploads\/sites\/2\/2015\/11\/gmail-spam-filter.png 1605w, https:\/\/d3.harvard.edu\/platform-digit\/wp-content\/uploads\/sites\/2\/2015\/11\/gmail-spam-filter-300x157.png 300w, https:\/\/d3.harvard.edu\/platform-digit\/wp-content\/uploads\/sites\/2\/2015\/11\/gmail-spam-filter-1024x537.png 1024w, https:\/\/d3.harvard.edu\/platform-digit\/wp-content\/uploads\/sites\/2\/2015\/11\/gmail-spam-filter-600x314.png 600w\" sizes=\"auto, (max-width: 756px) 100vw, 756px\" \/><\/a><\/p>\n<p>The value created to users is significant. With almost a billion users [3], and using my own personal spam folder to estimate 20 spam emails per day, and ~1 second saved per spam email, I estimate ~200,000 man-years saved per year. On a per-user basis the numbers are not as impressive, but with little differentiation between email services, a bad spam filter just might be what\u2019s needed to make a user switch. That is not to mention the benefit of not falling prey to the various scams out there.<\/p>\n<p>&nbsp;<\/p>\n<p>The value is captured somewhat indirectly. By offering a better email experience, Google hopes to get even more users to switch to Gmail, and to use it more extensively (at the expense of its competitors). With more users, Gmail can of course display more email ads. But even more than that, there are multiple synergies with Google\u2019s other products, the most obvious one is probably better targeting of Google search ads based on the user\u2019s online behavior, as it\u2019s manifested in their email account (e.g., if a user receives lots of emails from Amazon, maybe they\u2019re more likely to click a link to Amazon\u2019s website).<\/p>\n<p>&nbsp;<\/p>\n<p>This brings us to two unique challenges faced by Gmail, when designing its spam filters. One, it never actually observes whether it made the right decision. It can use <em>proxies<\/em>, such as whether the user ignored the message, or deleted it, as a proxy to whether they were interested in receiving it in the first place. But it doesn\u2019t really know \u2013 maybe the user wanted to receive the email, but was just quick to process the information contained in it. Another challenge is understanding the cost\/value of a false negative and false positive. Unlike a gaming company that can easily track whether they increased spending per user, with spam filters, it\u2019s not obvious whether the users are actually better off after making a change, and even more so whether Google overall is better off with all the aforementioned synergies.<\/p>\n<p>&nbsp;<\/p>\n<p>Going forward, the spam filter development is likely to continue its cat-and-mouse dynamic. Spammers are constantly trying to adapt to the rules adopted by Gmail\u2019s spam filters. Already the internet is full of lists such as \u201c16 Ways To Get Your Email Past Spam Filters\u201d [4]. What\u2019s almost certain though, is that the world today is far more spam-free than it was ten years ago (e.g., 2010 showed an ~80% drop is spam volume [5]) \u2013 perhaps a result of better filtering eroding the financial upside of spamming \u2013 making the digital world a better place.<\/p>\n<p>&nbsp;<\/p>\n<p><strong><u>References<\/u><\/strong><\/p>\n<p>Photo credit: <a href=\"techydudes.com\">techydudes.com<\/a><\/p>\n<p>[1] \u2013 <a href=\"https:\/\/www.m3aawg.org\/sites\/default\/files\/document\/M3AAWG_2012-2014Q2_Spam_Metrics_Report16.pdf\">https:\/\/www.m3aawg.org\/sites\/default\/files\/document\/M3AAWG_2012-2014Q2_Spam_Metrics_Report16.pdf<\/a><\/p>\n<p>[2] \u2013 <a href=\"http:\/\/techcrunch.com\/2015\/07\/09\/google-improves-gmails-spam-filters-launches-new-analytics-tool-for-bulk-senders\/\">http:\/\/techcrunch.com\/2015\/07\/09\/google-improves-gmails-spam-filters-launches-new-analytics-tool-for-bulk-senders\/<\/a><\/p>\n<p>[3] \u2013 <a href=\"https:\/\/plus.google.com\/+Gmail\/posts\/AjktcDswdKh\">https:\/\/plus.google.com\/+Gmail\/posts\/AjktcDswdKh<\/a><\/p>\n<p>[4] \u2013 <a href=\"https:\/\/expresspigeon.com\/blog\/2014\/07\/28\/avoid-spam-filters\">https:\/\/expresspigeon.com\/blog\/2014\/07\/28\/avoid-spam-filters<\/a><\/p>\n<p>[5] \u2013 <a href=\"http:\/\/www.symantec.com\/connect\/blogs\/why-my-email-went\">http:\/\/www.symantec.com\/connect\/blogs\/why-my-email-went<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Gmail tirelessly improves its spam filters to identify exactly which messages shouldn\u2019t make it to your inbox<\/p>\n","protected":false},"author":51,"featured_media":2333,"comment_status":"open","ping_status":"closed","template":"","categories":[134,655,776,900,366,899],"class_list":["post-2332","hck-submission","type-hck-submission","status-publish","has-post-thumbnail","hentry","category-analytics","category-data","category-data-analytics","category-filter","category-machine-learning","category-spam"],"connected_submission_link":"https:\/\/d3.harvard.edu\/platform-digit\/assignment\/data-driven-value-creation-value-capture-and-operating-models\/","yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Gmail: ensuring a spam-free inbox with Machine Learning - Digital Innovation and Transformation<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Gmail: ensuring a spam-free inbox with Machine Learning - Digital Innovation and Transformation\" \/>\n<meta property=\"og:description\" content=\"Gmail tirelessly improves its spam filters to identify exactly which messages shouldn\u2019t make it to your inbox\" \/>\n<meta property=\"og:url\" content=\"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"Digital Innovation and Transformation\" \/>\n<meta property=\"og:image\" content=\"https:\/\/d3.harvard.edu\/platform-digit\/wp-content\/uploads\/sites\/2\/2015\/11\/Gmail-AntiSpam-FDG.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"500\" \/>\n\t<meta property=\"og:image:height\" content=\"489\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/submission\\\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\\\/\",\"url\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/submission\\\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\\\/\",\"name\":\"Gmail: ensuring a spam-free inbox with Machine Learning - Digital Innovation and Transformation\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/submission\\\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/submission\\\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2015\\\/11\\\/Gmail-AntiSpam-FDG.jpg\",\"datePublished\":\"2015-11-21T23:15:07+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/submission\\\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/submission\\\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/submission\\\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\\\/#primaryimage\",\"url\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2015\\\/11\\\/Gmail-AntiSpam-FDG.jpg\",\"contentUrl\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2015\\\/11\\\/Gmail-AntiSpam-FDG.jpg\",\"width\":500,\"height\":489},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/submission\\\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Submissions\",\"item\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/submission\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Gmail: ensuring a spam-free inbox with Machine Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/#website\",\"url\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/\",\"name\":\"Digital Innovation and Transformation\",\"description\":\"MBA Student Perspectives\",\"potentialAction\":[{\"@type\":\"性视界Action\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/d3.harvard.edu\\\/platform-digit\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Gmail: ensuring a spam-free inbox with Machine Learning - Digital Innovation and Transformation","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/","og_locale":"en_US","og_type":"article","og_title":"Gmail: ensuring a spam-free inbox with Machine Learning - Digital Innovation and Transformation","og_description":"Gmail tirelessly improves its spam filters to identify exactly which messages shouldn\u2019t make it to your inbox","og_url":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/","og_site_name":"Digital Innovation and Transformation","og_image":[{"width":500,"height":489,"url":"https:\/\/d3.harvard.edu\/platform-digit\/wp-content\/uploads\/sites\/2\/2015\/11\/Gmail-AntiSpam-FDG.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/","url":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/","name":"Gmail: ensuring a spam-free inbox with Machine Learning - Digital Innovation and Transformation","isPartOf":{"@id":"https:\/\/d3.harvard.edu\/platform-digit\/#website"},"primaryImageOfPage":{"@id":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/#primaryimage"},"image":{"@id":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/d3.harvard.edu\/platform-digit\/wp-content\/uploads\/sites\/2\/2015\/11\/Gmail-AntiSpam-FDG.jpg","datePublished":"2015-11-21T23:15:07+00:00","breadcrumb":{"@id":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/#primaryimage","url":"https:\/\/d3.harvard.edu\/platform-digit\/wp-content\/uploads\/sites\/2\/2015\/11\/Gmail-AntiSpam-FDG.jpg","contentUrl":"https:\/\/d3.harvard.edu\/platform-digit\/wp-content\/uploads\/sites\/2\/2015\/11\/Gmail-AntiSpam-FDG.jpg","width":500,"height":489},{"@type":"BreadcrumbList","@id":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/gmail-ensuring-a-spam-free-inbox-with-machine-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/d3.harvard.edu\/platform-digit\/"},{"@type":"ListItem","position":2,"name":"Submissions","item":"https:\/\/d3.harvard.edu\/platform-digit\/submission\/"},{"@type":"ListItem","position":3,"name":"Gmail: ensuring a spam-free inbox with Machine Learning"}]},{"@type":"WebSite","@id":"https:\/\/d3.harvard.edu\/platform-digit\/#website","url":"https:\/\/d3.harvard.edu\/platform-digit\/","name":"Digital Innovation and Transformation","description":"MBA Student Perspectives","potentialAction":[{"@type":"性视界Action","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/d3.harvard.edu\/platform-digit\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/d3.harvard.edu\/platform-digit\/wp-json\/wp\/v2\/hck-submission\/2332","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/d3.harvard.edu\/platform-digit\/wp-json\/wp\/v2\/hck-submission"}],"about":[{"href":"https:\/\/d3.harvard.edu\/platform-digit\/wp-json\/wp\/v2\/types\/hck-submission"}],"author":[{"embeddable":true,"href":"https:\/\/d3.harvard.edu\/platform-digit\/wp-json\/wp\/v2\/users\/51"}],"replies":[{"embeddable":true,"href":"https:\/\/d3.harvard.edu\/platform-digit\/wp-json\/wp\/v2\/comments?post=2332"}],"version-history":[{"count":0,"href":"https:\/\/d3.harvard.edu\/platform-digit\/wp-json\/wp\/v2\/hck-submission\/2332\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/d3.harvard.edu\/platform-digit\/wp-json\/wp\/v2\/media\/2333"}],"wp:attachment":[{"href":"https:\/\/d3.harvard.edu\/platform-digit\/wp-json\/wp\/v2\/media?parent=2332"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/d3.harvard.edu\/platform-digit\/wp-json\/wp\/v2\/categories?post=2332"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}