In my current project, we do a bunch of work revolving around Elasticsearch and enabling our customers to quickly access the relevant portions of our large data set. A couple of weeks ago I was asked to come up with a method to compare the costs of working with NGrams and Edge NGrams. I tried to make my life easier and look around the Internet for somebody else’s breakdown but I didn’t find anything I would like. So I decided to bite the bullet and do the work myself. In this post and the follow up one, I would like to present my way of reasoning about NGrams and Edge NGrams.