The day when the computer becomes a data scientist

The day when the computer becomes a data scientist

Don’t be shocked! You knew that day would come… you could have even predicted the exact date, because you are a highly skilled data scientist!

Relax, that day has still not come. No computer is threatening your position, and you can skip the help wanted ads in the newspaper for now (Who reads the newspapers today anyway!?)

So, when should data scientists start worrying about losing their jobs? Probably not in the coming years, but they should be aware that parts of their work can be done by a computer algorithm.

Sounds a bit surprising, right? After all, data scientist is one of the most popular and demanded jobs in the tech industry. So why am I so pessimistic about its future? Why do I write this article that will decrease the motivation of those who are enthusiastic to become data scientists?

Before answering that – let’s briefly discuss what a typical data scientist is: a person who can study, analyze and interpret a huge amount of data by using mathematical and statistical methods as well as Machine Learning and Neural Network algorithms. The scientist is skilled to predict almost everything in almost every field by using the mentioned methods.

The data scientist usually starts every project by digging into the data (using charts, scatter plots, histograms and other visual tools), then cleaning it by dropping irrelevant variables (and adding missing data) – AKA preprocessing. The next step is choosing the right classifier / regression method followed by picking the right features in the data in order to get the most accurate prediction. In between, the data scientist tests different combinations of classifiers parameters for obtaining the most optimal and efficient prediction mechanism.

All the mentioned steps and methods demand high analytical and comprehension skills from the person who apply them, and right now, it doesn’t look like a computer can do all of these steps better than a human being.

Nevertheless, the computer plays an important role in many parts of the data scientist’s projects. A good example for this – is the Cross Validation in the Model Selection module where an algorithm ‘finds’ best classifier or the best classifier parameters. By that, it saves time for the scientist, who can skip this part and focus on other challenges.

Another potential ‘data science’ task that the computer will be able to do in the near future (or maybe it already does in some existing algorithms), is the initial scan and analysis of the data it receives. The computer will be able to decide by deep learning which parameters and features are important and which parameters are not. It can even create many histograms / scatter plots for the every combination of the features, and by image recognition it can determine the type of dependency between one feature to another.

Now, think about how many companies are using big data analytics, and how many different types of classifiers and regressions have been developed in so many fields so far. It is quite possible that there is an ML algorithm for almost every new need. Think about the option of storing every developed ML algorithm in a database, and whenever someone (without data science skills) needs to study and learn from data, the computer will choose the best classifier/analyzing method for his needs from the database!

I guess you understand where I’m going with this – almost every part of the data scientist’s work can or will able to be done by a Machine/Deep Learning algorithm, or in other worlds –  by Artificial Intelligence (AI), and it really should not surprise us. AI, which is one of the most important application and research fields within the tech industry, takes over, and yes, data science plays a super important role in developing it! AI is responsible for many tasks that machines can do by themselves.  Who would have thought a few years ago that computers would be able to code themselves? But thank to AI they can, and they will get improved with time. Computers are replacing and will continue to replace humans in many fields, so it’s probably a matter of time before a computer will also be a data scientist.

Think about it! A computer that the only thing you need to do for it – is giving the data you have, and what you want to know based on this data (see Fig. 1). The computer will have a friendly simple interface that will receive your data (you can just paste it from your Excel file), and with a smart AI algorithm it will give you the most accurate answer/prediction!

Fig. 1: A scheme that describes the steps needed for an AI machine to provide prediction. In the described scenario, all possible data (unprocessed) is given to the computer along with the feature we want to know/predict. These two inputs will be processed by a sophisticated AI algorithm, which will provide the most accurate prediction based on the data. The algorithm will basically do all the data scientist’s tasks.

Sounds awesome for the tech companies, but not for the future data scientists…

Ok, wait! Don’t be panic, dear data scientists (or data scientists to be)! Your future is not that bad! I know I described a pessimistic scenario of the data science, but I want to clarify, that I was talking about the current tasks and projects of the data scientists that will probably be performed by the computer. I did not say that that the data scientist career will ‘die’. It is quite possible that it will evolve to a different ‘line of work’; Perhaps the future data scientists will ‘supervise’ the computer and monitor its operations when it processes the data, perhaps they will develop new AI methods, or will focus on studying new theoretical and mathematical models (or perhaps they will do something else…). What is certain is that the future data scientists will deal different tasks and challenges. I am not worried about the ‘transformation they will have to undergo’.  I believe there are skilled enough to learn and adapt to the ‘new’ data science future.

Whether you like it or not, future data science will be quite different from the present one. That is, as mentioned, thanks to AI that ‘kills’ many occupations, changes professions, but at the same time ‘creates’ new positions in the tech industry. Data science (if it will still be called that), is one of the occupations that will probably not ‘die’ but will change, and will also have a crucial role in the future AI technology.

What will be the role and the new duties of the future data scientists? We can only guess or speculate (or develop an AI algorithm that will able to predict that).


The day when the computer becomes a data scientist

Can a Machine Learning algorithm predict the next fight with your girlfriend?

Can a Machine Learning algorithm predict the next fight with your girlfriend?

Imagine yourself the following situation:  You are sitting in a lecture hall, listening to a lecture on mechanics, bored to death, fantasizing the moment you kiss your girlfriend you’ve been dating for the last two months (love is in the air!) Only three more hours to go! Suddenly your i-Relation app sends you the following text: “Hey Josh, watch out! There is a chance of 95% that you are going to have a serious fight with Emily today!” “Oh darn!” you must be thinking, but wait! This app has just saved your ass. You are sending to the love of your life the following text message: “Hey sweetie, I’m so sorry, but something has come up, and I cannot see you today, let’s meet tomorrow instead.” She sends back a sad smiley. Looks like she is a bit disappointed, but tomorrow you will meet her, everything will be forgiven and together you will have a great time!

Thanks to that awesome app, you never fight with Emily, you two get married, bring three amazing children to the world, and your relationship is perfect like a magical fairytale (again, thanks to the your i-Relation app).  You share this great app to the world through every social network, every couple stops to fight, and the world becomes a better place to live!

Couple fighting – illustration

Is this scenario realistic?

Well, at least some of it. With the power of Machine/Deep Learning this app is really doable. “How”? You probably ask yourself. To answer that, let’s take a break talking about conflicts in relationships, and discuss some basic ideas in Machine Learning (ML). This “hot” field in artificial intelligence (AI) is used today by every major IT company, and provides prediction in almost every field: human behavior, spam detection, financial trending, smart cars navigation, appropriate medical treatments and more.

How does it work? In most of the cases the machine (computer program) gets a training dataset that includes several training features. The dataset has many combinations of these features where every combination is classified to its appropriate class. For example, assume we have medical database of patients with their current physical condition expressed by categories (features): body temperature, blood pressure, heart rate etc. For every patient we have a medical diagnosis made by a physician (let’s call the medical diagnosis – ‘target’). We give the data set to a computer program along with the medical diagnosis (target); the computer ‘learns’ all the data and all the combinations.

Now, we have a new patient with the following symptoms: a fever of 103°F (39.4°C), blood pressure of 160/90, white blood cells count of 225,000 c/mm3 and only 3 fingers on his right hand. We give the computer these features; the computer thinks for three milliseconds and determines that the patient has michtophunulumia!!! (Don’t try to look for it on Google, I’ve just made that up for the example). The determination the computer took is based on the medical dataset we gave it before. It uses a classifier (such as decision tree / random forest) to find the best match between the symptoms (features) of the new patient and the most appropriate medical condition for him based on the trained data.

This case is an example of how supervised learning works. Of course there are many more methods and processes like validations / filtering unnecessary information / preventing overfitting and more which are used in the ML code in order to improve the prediction accuracy. For the simplicity of this article we will not discuss these subjects.

Now let’s go back talking about conflicts, and the prediction of their occurrence. According to personal experience and after studying the subject among my friends, I can definitely assure you that couples cannot predict most of their fights (otherwise they would have prevented them). I, for example, who know my wife for more than 10 years, still fall into the trap of fighting with her. With all the experience I’ve got with her, I cannot recognize the ‘seeds’ that have the potential to grow into a big fight (and I am definitely not alone in this).

The problem in predicting conflicts by the human brain is to learn the most dominant causes, which can be lots of combinations of features in the human behavior / physical mood / external causes. There is too much data to process in a short time. Of course there are the generic reasons for conflicts: cheating, money, problem with kids, but in many fights we cannot point the spark that lights the flames of the conflict…

So… let’s give the computer do all the hard work! For making it done – every person will have the app i-Relation which will know everything about him, and when I say everything, I mean EVERYTHING. (Don’t be shocked, every standard app knows all about you by collecting information through every sensor in your smartphone). The i-Relation app will know where you are, what you are eating, when you have your period ,how much you slept, what disease you had a week ago, what you said to your banker when you saw him jogging, and of course, when you are fighting with your better half. Don’t worry, it sounds a bit complicated for your phone to know all of these, but it is quite possible – remember it has a GPS, camera, microphone and many other sensors that collect information every single second and save it on a cloud. Oh, by the way, both significant others must have the app installed, so the first person’s app will also collect the data of the second person and vice versa.

Let’s assume that for optimal performance, the app needs to collect data every moment for at least two/three months. After finishing storing all the data, the ML algorithm starts the classification! First, for preventing overfitting, the app’s supervised learning algorithm will have to find to most dominant parameters among the many features it has stored. It is more than logical that every couple has different dominant parameters, so by embedded methods and cross validations, the app will remove the unneeded, irrelevant and redundant attributes for every tested couple. 

The next step is using a classifier or regressor for training, I guess the most suitable regressor for this app is logistic regression, which gives a binary response variable (remember that the main output of the app is: “Fight today” or “No fight today”). Of course there is more than that, and it is just the basic principle of the app. Besides that, many many procedures, processing, pre-processing, validations, deep learning techniques, statistical analyses and more should be included in the product, but the main idea is simple, or at least I hope it is.

To summarize, this app can really be realistic! Of course it would take time to develop it and to obtain optimal performances. It is also not guaranteed that it will provide 100% accurate fight prediction. It should be noted, by the way, that no AI/ML app can provide 100% prediction accuracy today (although there are some that are very close). There are some fields in AI with accuracy that has been dramatically improved during the last years (such as speech and face recognition), but there are others that need to be improved. The proposed i-Relation app described here involves several properties and techniques of ML and AI that will probably require some time making it work well, as said before. Nevertheless, I am sure that with the right knowledge, the day is not far distant when we wake up in the morning, and find the chances of fighting with our partner.

Your morning coffee with your morning notification about the chances of fighting with your boyfriend– illustration

This app is just another example of the abilities of ML predicting almost everything – for better or worse. I guess that some will disagree with me (those in the divorce industry for example), but I would be happy to have such an app!


Can a Machine Learning algorithm predict the next fight with your girlfriend?