SQL

Find duplicates with RANK

In: SQL

Let's clean out duplicates! Here's how.

1️⃣ Check if duplicates exists.
Use π™Άπšπ™Ύπš„π™Ώ π™±πšˆ + π™²π™Ύπš„π™½πšƒ(*) to find duplicates. See example 1 in image below.

2️⃣ Identify exact rows that are duplicated.
Use πšπ™°π™½π™Ί π™Ύπš…π™΄πš(π™Ώπ™°πšπšƒπ™Έπšƒπ™Έπ™Ύπ™½ π™±πšˆ πšŒπš˜πš•πšœ). See example 2.

What does πšπ™°π™½π™Ί π™Ύπš…π™΄πš(π™Ώπ™°πšπšƒπ™Έπšƒπ™Έπ™Ύπ™½ π™±πšˆ πšŒπš˜πš•πšœ) do?
πŸ‘‰ divides the table into multiple partitions, one for each unique value of cols, then
πŸ‘‰ assigns a rank from 1 to the number of rows in the partition, then
πŸ‘‰ returns more than 1 if the row is duplicated

You can even find duplicated values in TWO columns.

In example 3, I find duplicated show_name and actor_name with this:
πšπ™°π™½π™Ί π™Ύπš…π™΄πš(π™Ώπ™°πšπšƒπ™Έπšƒπ™Έπ™Ύπ™½ π™±πšˆ πšœπš‘πš˜πš _πš—πšŠπš–πšŽ, πšŠπšŒπšπš˜πš›_πš—πšŠπš–πšŽ) > 𝟷

As usual, full example below.

More from Travis Tang
Use ARRAY_AGG to flatten columns to lists.
SQL

Use ARRAY_AGG to flatten columns to lists.

You need to convert a long table of values into a list? This is the most convenient function. πŸ“Œ Syntax: πš‚π™΄π™»π™΄π™²πšƒ π™°πšπšπ™°πšˆ_𝙰𝙢𝙢(πšŒπš˜πš•πšžπš–πš—) π™΅πšπ™Ύπ™Ό πšπšŠπš‹πš•πšŽ There
Pivot Table (From Long to Wide)
SQL

Pivot Table (From Long to Wide)

Every data scientist, scientist, and engineer should know how to create a pivot table. π™²π™°πš‚π™΄ πš†π™·π™΄π™½ is the best way to do
Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to Travis Tang.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.