Handling missing data is a crucial step in the data preprocessing pipeline for any machine learning project. Imputation, the process of replacing missing data with substituted values, is essential for building robust and reliable models. This article explores various imputation techniques, provides code examples, and …
Read MoreData leakage is a critical issue in machine learning that can lead to overly optimistic performance metrics and poor generalization to new data. This article explains what data leakage is, why it is problematic, and how to avoid it during data pre-processing. What is Data Leakage? Data leakage occurs when information …
Read MoreDefinition Simply put, a Bloom filter is a space-efficient probabilistic data structure with which we can determine the probable existence of a certain thing in a certain data set, and we can determine the non-existence of a certain thing in a certain data set with utmost accuracy. Doing all this in a memory space …
Read MoreIf you have forked a repository in GitHub that you want to work on, you do so, make your changes locally and once you are confident that everything works, you issue a pull request to the upstream so that your changes can be reviewed and merged. Here is a small snippet that I use to keep my fork up-to-date with the …
Read MoreBasically a distributed system is one in which the components or processes or nodes that comprise a system is distributed across nodes or sometimes even across geographies. But these systems need to communicate with each other to accomplish something meaningful. The most efficient, scalable and proven way of making …
Read MoreI have been a happy user of the Travis CI free usage over the last couple of years. But one announcement recently made me deeply worried. Have a look here at their announcement Yes, Travis CI will no longer be free for OSS projects. There is no point in blaming Travis CI for this but rather on those idiots who abused …
Read MoreRecently I have been exploring more around the Kubernetes tooling, especially the ones that can do some pre-validation on the schema and the state of my Kubernetes deployment resources or more precisely the YAML files. I came across a few of them like conftest, kubeval etc and a wrapper around such set of tools like …
Read MoreSome time around the summer of 2019, I came across a blog post where I read about GitOps and I was flattened by the idea itself and ever since I wanted to give it a try on some serious projects but never had the chance to do it professionally. I had a chat about this with many of my colleagues at work and my team, they …
Read MoreI wanted to play around with OpenCV and thought it might be a good idea to try OpenCV with a real life use case. DIY'ing a home camera system that can do motion detection and click images when there is some movement in the frame sounded like a cool idea. So I researched on how I could get this set up done. There were …
Read MoreSome time ago I managed to set up a 4 node K8s cluster on a set of Raspberry Pi's that were lying idle at my home. In case it interests you, please have a look here for the complete setup, the required components and on how to get it up and running
Read More