Contributors: F M Nurul Huda Pathan, Soumya Ranjan Mishra

This article aims to make you understand the System Design of the Uber application. We hope you will enjoy the blog. Let’s Start!

Uber is an American technology company that provides ride-hailing, food delivery (Uber Eats), package delivery, couriers, freight transportation, and, through a partnership with Lime, electric bicycle and motorized scooter rental. Most of its revenues come from ride-hailing.


Let’s first talk about time series before discussing about the different aspects of feature engineering in time series.

Time Series refers to a series of data points indexed in time order. In other words, a time series is a sequence taken at successive equally spaced points in time. Examples of time series are heights of ocean tides and the daily stock closing price of NIFTY 50.

We can analyze the time series data to extract meaningful and significant statistics and other characteristics of the data. These extracted information can be used to enhance profits by traders or investment firms. We…


For any machine learning model we built, we need to validate the stability of our model. We may face a situation in deciding the right choices about predictive variables to use, what types of models to use, what arguments to supply those models, etc. We make these choices in a data-driven way by measuring model quality of various alternatives. Train test split which splits the dataset, is one method to measure model quality on the test data. Cross-validation extends this approach to model scoring (or “model validation.”) …


With the advancement of AI, many challenging tasks can be accomplished with the help of technology in a short span of time. People who speak different languages can text each other using translation apps like Google Translator. There are many apps which can recognize human voice and performs a particular task. Examples of such apps include Cortana, Siri, etc.

We might have come across the terms such as speech recognition, natural language understanding, and natural-language generation. All these terms belongs to a field of AI called NLP. NLP stands for Natural Language Processing which is a field that deals with…


When we talk about unsupervised machine learning algorithms, we have an intuition that the machine learning model will be fed with unlabeled data to predict the underlying patterns in the data. Clustering is one of the important unsupervised machine learning algorithm.

Now, let’s discuss in detail how clustering works. Suppose a company X want to categorize customers based on their spending on their products into 3 categories viz. more spenders, less spenders and average spenders. A clustering algorithm can help in solving this problem.

We can define clustering as the process of organizing objects into groups whose members are similar…


When we talk about supervised machine learning, Linear regression is the most basic algorithm every one learns in data science. Let’s try to understand the term Regression.

Regression is a technique from statistics that is used to predict values of a desired target quantity when the target quantity is continuous. Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent(target) and independent variable(s)(predictor). This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables. For instance, if you want to study the relationship between road accidents and…


We might have heard that some product companies claim that their product is 95% efficient in controlling a particular disease or an unwanted a phenomenon. For example, a company claims that, its product X kills 99.9% of germs. So how can they say so? There has to be a testing technique to prove this claim right?

Here comes the concept of hypothesis testing which is used to prove a claim or any assumptions. The main purpose of statistics is to test a hypothesis. Let me introduce about the key concepts of hypothesis testing and why it is required.

A hypothesis


Having a great knowledge on Calculus, Linear Algebra, Probability Theory and Statistics is an essential trait every great data scientists posses. A solid understanding on these topics will eventually help an aspiring data scientist a lot in learning Machine Learning models. It will give them an edge among their peers.

Let me introduce a brief introduction of some of these concepts in this article and discuss about their role as an aspiring data scientist’s career growth.

Random Variable: A random variable is a numerically valued variable which takes on different values with given probabilities. …

F M Nurul Huda Pathan

Aspiring Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store