In this post, we will be taking a look at what has changed in the latest ElementBot version: 5.0.0. More specifically, what has changed in our Reddit image searching algorithm and how we overcame the issues we encountered along the way.
Introduction of the problem
How was our system before? It was quite terrible if I am being honest. A task was scheduled to run every few seconds that would start fetching subreddits of a category and then replace a file with the new subreddits. When a user executed a command, the bot would go through all this list and check if the image was in the database. If it was then it kept on going, if it was not then it stopped, sent it, and finally added it to the database.
If you are a developer yourself you might already be screaming and calling us incompetent programmers, and yes, this method wastes a lot of resources and increases the time wait a lot. Imagine having to do this every time someone in a guild of 5,000 people executes a command. This approach is more or less correct for small scale but for large scale it is horrendous, and as we started growing from 1k to 2k, from 2k to 3k guilds we realized what horrible design choice we had just made. However, the problems did not stop there, our system did not check the post actually contained media we could produce in our limited discord environment which resulted in a lot of blank messages from our bot. This was not sustainable at all and something had to be done about it.
We had to think about a solution, and then it's when we figured out that we needed a cache. We needed to do a lot of caching in ElementBot as at that moment as you can imagine we just requested everything off the database when we needed it. Wasting 8GB of RAM and exhausting our processor because of our laziness.
The blank message issue also had a fix: looking at the media type that Reddit was so kind to provide for each post, although this still introduced a new issue that will be discussed later.
Now the issue was actually implementing the cache. The data that we would save was quite obvious, an index of the last image shown to a guild within a specific category. With that information, we could jump to the next image index and not even bother the database checking all previous images. Then we could just save everything within an array in our code but we were too lazy to apply that solution. We would have to make sure the cache is cleared or else it would eat our RAM and we just could not be bothered. That's why we chose Redis to do the work for us.
This also had the advantage that the cache would not clear if we restarted the bot and the code running the cache would be good as Redis is widely used.
We also had the last issue to solve. As we replaced the file containing all images in the previous version, there were not many images to show. In the new cache, everything would be kept until the file reached 50MB, approximately 117.500 image objects per category within a file. This would ensure people could always enjoy our content.
Then we realized the error that this introduced, old posts had a chance of getting deleted. And when a post is deleted its media will not show but instead throw a 404 error. However, a very low percentage of posts are deleted and thus we decided to just ignore the flaw altogether as it would not bother many people.
So that is how ElementBot runs nowadays. My testers and I have noticed a big speed increase and fewer images being blank which is awesome. Now we do have a product to be proud of.
- Cache: Program or group of code that saves something so when a new request comes in it does not have to fetch the database or other medium of more permanent storage. Data only survives for some seconds or hours in a cache.
- Array: Group of values within a variable.