Data Science

How can you draw relevant conclusions without knowing anything about the underlying data?

Photo by Jason Blackeye on Unsplash

Let’s say you dream about building your own fishing cabin along the lake. In your garage, you find a bunch of tools, a pile of wooden planks, various nails, and screws. Would you rather:

a) Start immediately to build the walls, without checking if you have enough nails, a proper hammer or even if you have enough planks. Who cares? You will figure this out eventually in the middle of the process.
b) Or, take a moment to check your set of tools, sort the planks by size and state, perhaps dropping the rusty nails and going to the shop to…

Apache Spark

How to code the key functions for doing Exploratory Data Analysis in PySpark

Photo by Jez Timms on Unsplash

Our (love-)story with Spark started when the company decided to venture into the exciting world of AI. Very quickly, we came across the Databricks platform and, after exploring it during a two weeks POC, we were ready for the rumble.

From a Data Scientist perspective:

“The platform simplifies the Spark environment settings and offered collaborative workspaces for exploration, visualization and models deployment at the light speed. ”.

For help to get started with Databricks, check out this article:

Started to work on Spark, can feel like we (almost) have to learn a new language: PySpark. It is close enough to…

Data Science

Most firms that think they want advanced AI/ML really just need linear regression on cleaned-up data [Robin Hanson]

Beyond the sarcasm of this quote, there is a reality: of all the statistical techniques, regression analysis is often referred to as one of the most significant in business analysis. Most companies use regression analysis to explain a particular phenomenon, build forecasting or to make predictions. These new insights can be extremely valuable in understanding what can make a difference in the business.

When you work as a Data Scientist, building a linear regression model can sound pretty dull especially when it’s all about AI around you. However, I want to stress that mastering the main assumptions of linear regression…


Avoid the most common pitfalls with this reading

Indexes are potent indicators for global and country-specific economies. They are massively used by governments and traders to formulate economic policies, refine external trade and measure changes in money value.

Indexes are meant to reflect changes in a variable (or group of variables) regarding, for example, time or geographical location. That’s why they are commonly used to compare the levels of a phenomenon on a certain date with its level on a previous date or the levels of a phenomenon at different places on the same date.

ex. The price of oil in March 2020 compared to the price in…

Music experience

A quick overview of how it creates a unique listening experience.

8D music sounds like we are talking about a new kind of audio technology coming straight from the future. In reality, this technology along with ambisonics existed since the ’70s, but never really took off.

But all of a sudden, a message promoting the new music of the Pentatonix composed with 8D technology got viral on WhatsApp. Maybe you already received it from some friends, if not yet, don’t worry it will come:

The viral message on Whatsapp promoting both 8D music and Pentatonix sounds.
The viral message on Whatsapp promoting both 8D music and Pentatonix sounds.
Source: Author
The soundtrack following the message on WhatsApp

Why now?

  • Maybe because the technology used to simulate the sensory experience is getting much better now.
  • Or perhaps because…

Machine learning

What you want to know about this powerful machine learning algorithm before to start

Photo by Vladislav Babienko on Unsplash

What is Random Forest?

According to the official documentation: “A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement.”

In other words, Random Forest is a powerful, yet relatively simple, data mining and supervised machine learning technique. It allows quick and automatic identification of relevant information from extremely large datasets. The biggest strength of the algorithm is that it relies on the…

Aurélie Giraud

Analytic Translator | AI/ML & Statistics Player | Unlock Business Opportunities ✅𝗵𝘁𝘁𝗽𝘀://𝘄𝘄𝘄.𝗹𝗶𝗻𝗸𝗲𝗱𝗶𝗻.𝗰𝗼𝗺/𝗶𝗻/𝗮𝘂𝗿𝗲𝗹𝗶𝗲𝗴𝗶𝗿𝗮𝘂𝗱

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store