NL

Pivotal data moment #1

From VLOOKUP to join

It is 2015. A younger version of me is busy processing hundreds of survey forms in Excel, preparing to run an analysis in SPSS. Working on my thesis.

The phone call that started it

While cleaning the data I called a friend, Martijn. My question was simple: if table 1 and table 2 both contain the same ID, can I not look up a field from one in the other?

Answer: yes. VLOOKUP.

That moment showed me for the first time that separate tables do not have to stay separate. And what I did not know yet: in the data world this principle is not called VLOOKUP, it is called a join. And that is what this is about: the good old join!

If you understand VLOOKUP, you already understand a lot about joins

You have a table on the left, a table on the right. You use a shared column to connect them. That column is called a key. Up to that point, it is basically just VLOOKUP.

But when you are really engineering with data, the questions get bigger:

  • Which rows do I want to keep?
  • What do I do with records that have no match?
  • Do I expect one match or multiple?
  • Is my model actually correct?

That is exactly where it shifts from an Excel trick to real data modelling. For all those questions there are different joins. Click one to read the explanation, listed in order of complexity:

Click a join above to read the explanation.

Why this is more than a trick

A join is not the most technically complex part of data work.

If you understand the variations above, you are already halfway there. The syntax is not that hard either, especially with this page as a reference.

The only thing left after that is the question of whether you are actually joining the right tables, keys, and levels of detail. And that is exactly why joins are such a good topic.

They seem small. But they immediately touch on how you think about, structure, and validate data.

That is also why this topic naturally leads into my data modelling workshop: there it is no longer just about connecting tables, but about why a model does or does not work.

Why this was a pivotal moment for me

My data journey did not start with Power BI, not with SQL, and not with any fancy platform. It started with an Excel moment.

With the realisation that two tables together can tell more than either can alone. And honestly, I still think that is one of the best things about this field: making connections!

That is also how I approach topics like this on my site and in training: calmly, logically, without unnecessary smoke and mirrors. More about that in my approach.

Have a topic where you think: I want to really dig into this with my team? Use the contact form.

Created by Björn, with support of AI, owned by Dogoda. More disclaimers, here.