SQL

Generate Histogram Distribution

For a quick exploration of the distribution

Travis Tang

03 Jan 2023 — 1 min read

Exporting the data to python is cumbersome... use SQL to generate a frequency distribution for a quick exploration.

Here's how:
- Use 𝙵𝙻𝙾𝙾𝚁 to assign to buckets
- Fix missing bucket with 𝙶𝙴𝙽𝙴𝚁𝙰𝚃𝙴_𝚂𝙴𝚁𝙸𝙴𝚂
- Combine steps 1 and 2

Some caveats:
- Here, bucketing is done manually in a rather hacky way. Python's auto-bucketing is superior here.
- 𝙶𝙴𝙽𝙴𝚁𝙰𝚃𝙴_𝚂𝙴𝚁𝙸𝙴𝚂 is a PostgreSQL function. Its equivalent is also available in t-SQL, Redshift and Bigquery.
- Some dashboarding tools (Metabase) have built-in histogram visualization. You can use that instead if it's supported.

Use ARRAY_AGG to flatten columns to lists.

You need to convert a long table of values into a list? This is the most convenient function. 📌 Syntax: 𝚂𝙴𝙻𝙴𝙲𝚃 𝙰𝚁𝚁𝙰𝚈_𝙰𝙶𝙶(𝚌𝚘𝚕𝚞𝚖𝚗) 𝙵𝚁𝙾𝙼 𝚝𝚊𝚋𝚕𝚎 There you go, the column is flattened into a list separated by comma. ❌ To do so, I used to copy the entire column into a spreadsheet tool, transpose it, and use

Lazypredict: Run All Sklearn Algorithms With a Line Of Code

How to (and why you shouldn’t) use it

Convert Jupyter Notebooks into Functions

Papermill is an open-sourced tool for parameterizing, executing, and analyzing Jupyter notebooks. Just pass in parameters to the notebook, and the Jupyter notebook runs automatically.

Pivot Table (From Long to Wide)

Every data scientist, scientist, and engineer should know how to create a pivot table. 𝙲𝙰𝚂𝙴 𝚆𝙷𝙴𝙽 is the best way to do so in SQL. 🚀 Here are THREE ways to pivot a table for monthly sales. 1️⃣ We find the 𝚂𝚄𝙼 of all salesperson each month. Here, use 𝚂𝚄𝙼(𝙲𝙰𝚂𝙴 𝚆𝙷𝙴𝙽). 2️⃣ We find the average

Read more

Use ARRAY_AGG to flatten columns to lists.

Lazypredict: Run All Sklearn Algorithms With a Line Of Code

Convert Jupyter Notebooks into Functions

Pivot Table (From Long to Wide)