NGrams and Edge NGrams – space complexity

In my previous post NGrams and Edge NGrams – computational complexity, I explored the runtime implications of working both with NGrams and Edge NGrams. This is a follow-up post that will build on the previous one so if you are not familiar with these terms, I would suggest checking it first. In this post, I would like to take a look at the space requirements of NGrams and Edge NGrams and how to estimate the size required to store the data intended for your indices.

Continue reading “NGrams and Edge NGrams – space complexity”

NGrams and Edge NGrams – computational complexity

In my current project, we do a bunch of work revolving around Elasticsearch and enabling our customers to quickly access the relevant portions of our large data set. A couple of weeks ago I was asked to come up with a method to compare the costs of working with NGrams and Edge NGrams. I tried to make my life easier and look around the Internet for somebody else’s breakdown but I didn’t find anything I would like. So I decided to bite the bullet and do the work myself. In this post and the follow up one, I would like to present my way of reasoning about NGrams and Edge NGrams.

Continue reading “NGrams and Edge NGrams – computational complexity”