Purpose
The purpose of this project is to ingest publicly available BOM environment data, load it into a database and present the findings using a BI baseboard. It must update daily and by checking data sources.
Its also designed to showcase my skills as a data analyst with skills in Unix/Linux, databases, BI tools, Excel and
Scope
Audience
< Show diagram – The plan >
Identify data source and where files are kept. Look for dictionary and online help.
Link to BOM sources:
https://www.bom.gov.au/catalogue/anon-ftp.shtml
https://www.bom.gov.au/water/regulations/dataDelivery/howto/ftpAccount.shtml
Shell Script to Download Data
Below is a bash script to perform the recursive download of all CSV files from all BOM sites. I’m limiting myself to VIC BOM sites, but the shell script can be expanded to capture all available sites. You may need to download the “lftp” FTP utility and change the PATH names to match your setup. Its safe to run this multiple times as the --only-newer flag compares files and downloads anything not stored locally.
#!/bin/bash
# ─────────────────────────────────────────────────────────────
# BOM VIC Climate Data Ingestion Script (via lftp)
# Author: Bill K / MS Copilot
# Purpose: Mirror VIC climate data from BOM FTP
# ─────────────────────────────────────────────────────────────
TARGET_DIR="/home/bill/Projects/BOM/Data"
LOGFILE="/home/bill/Projects/BOM/bom_lftp_vic.log"
mkdir -p "$TARGET_DIR"
touch "$LOGFILE"
echo "[$(date)] Starting BOM VIC ingestion..." | tee -a "$LOGFILE"
# 🔌 Connect and mirror using lftp
lftp -u anonymous,anonymous ftp.bom.gov.au <<EOF
set ftp:ssl-allow no
# Navigate to VIC climate data directory - modify this for other states
cd anon/gen/clim_data/IDCKWCDEA0/tables/vic
# Mirror all files on first run, then only newer ones
mirror --verbose \
--only-newer \
--parallel=2 \
--use-pget-n=4 \
. "$TARGET_DIR"
bye
EOF
echo "[$(date)] BOM VIC ingestion complete." | tee -a "$LOGFILE"
Make script executable
$ chmod 755 /home/bill/Projects/BOM/01_Bom_Vic_Ingest.sh
Run script.
$ ./01_Bom_Vic_Ingest.sh
The output is quite verbose, and you’ll see output like this
...
...
Transferring file `walpeup_research/walpeup_research-202108.csv'
Transferring file `walpeup_research/walpeup_research-202109.csv'
Transferring file `walpeup_research/walpeup_research-202110.csv'
Transferring file `wangaratta_aero/wangaratta_aero-200901.csv'
Transferring file `walpeup_research/walpeup_research-202111.csv'
Transferring file `wangaratta_aero/wangaratta_aero-200902.csv'
Transferring file `walpeup_research/walpeup_research-202112.csv'
Transferring file `wangaratta_aero/wangaratta_aero-200903.csv'
Transferring file `walpeup_research/walpeup_research-202201.csv'
Transferring file `wangaratta_aero/wangaratta_aero-200904.csv'
Transferring file `walpeup_research/walpeup_research-202202.csv'
Transferring file `wangaratta_aero/wangaratta_aero-200905.csv'
...
...
Note below in the Data directory that the newly created directories with special characters, like brackets get treated with single quotes by the ls command.
$ ls -l
total 1132
drwxr-xr-x 2 bill bill 12288 Aug 26 08:37 aireys_inlet
drwxr-xr-x 2 bill bill 4096 Aug 26 08:41 albury_airport
drwxr-xr-x 2 bill bill 4096 Aug 26 08:39 alice_springs_airport
drwxr-xr-x 2 bill bill 12288 Aug 26 08:37 avalon_airport
drwxr-xr-x 2 bill bill 16384 Aug 26 08:42 bairnsdale_airport
drwxr-xr-x 2 bill bill 16384 Aug 26 08:39 ballarat_aerodrome
drwxr-xr-x 2 bill bill 12288 Aug 26 08:42 bendigo_airport
drwxr-xr-x 2 bill bill 20480 Aug 26 08:41 'breakwater_(geelong_racecourse)'
drwxr-xr-x 2 bill bill 20480 Aug 26 08:38 cape_nelson_lighthouse
Lets create a cron job so this tasks gets updated daily.
$ crontab -e
Modify this line to match your configuration and add it to the contab. Here’s mine. It runs a 6:30am every day appending its output, both standard out and standard error to the cron_ingest.log file. Late edition I also added the date command for better traceability which runs first and then complete (i.e &&) then run ingest script.
30 6 * * * date >> /home/bill/Projects/BOM/cron_ingest.log && /home/bill/Projects/BOM/01_Bom_Vic_Ingest.sh >> /home/bill/Projects/BOM/cron_ingest.log 2>&1
Database Configuration and Ingest Script
I’m using PostgreSQL for my industry project although I could have used any freely available database that runs on Raspberry Pi’s Debian based Pi OS.
Check existing databases to see if I need to create a new database or wether I have one ready to go.
In pgadmin on my Windows laptop I run this SQL command from the query window to output all database created.
SELECT datname
FROM pg_database
WHERE datistemplate = false
;
Output
"datname"
"postgres"
"nem"
"svr"
"fin"
"bom"
"iot"
"abs"
Lets use the BOM database and we’ll build some new tables to contain out data.
DBTables
Let’s create the tables based on the file structure of the datafiles we downloaded. Example file:
$ cat viewbank-202508.csv
'IDCKWCDE61,,,,,,,,,,
Australian Government Bureau of Meteorology,,,,,,,,,,
South Australia,,,,,,,,,,
Daily Evapotranspiration for VIEWBANK Victoria for August 2025,,,,,,,,,,
Issued at 22:31 GMT on Monday 25 August 2025,,,,,,,,,,
Copyright Commonwealth of Australia 2025 Bureau of Meteorology (ABN 92 637 533 532),,,,,,,,,,
Please note Copyright Disclaimer and Privacy Notice <http://www.bom.gov.au/other/copyright.shtml>,,,,,,,,,,
,,Evapo-,,Pan,,,Maximum,Minimum,Average,
,,Transpiration,Rain,Evaporation,Maximum,Minimum,Relative,Relative,10m Wind,Solar
Station Name,Date,0000-2400,0900-0900,0900-0900,Temperature,Temperature,Humidity,Humidity,Speed,Radiation
,,(mm),(mm),(mm),(C),(C),(%),(%),(m/sec),(MJ/sq m)
VIEWBANK,01/08/2025,1.0,0.0, ,14.3,0.3,100,63,1.15,9.29
VIEWBANK,02/08/2025,1.0,0.0, ,16.1,3.6,100,59,0.62,11.32
VIEWBANK,03/08/2025,1.6,0.0, ,19.7,0.4,100,38,1.45,11.85
VIEWBANK,04/08/2025,1.6,0.0, ,18.1,3.3,94,52,2.34,8.81
VIEWBANK,05/08/2025,1.9,3.4, ,16.5,8.5,91,48,3.30,8.38
VIEWBANK,06/08/2025,1.5,0.2, ,15.4,9.3,90,58,2.52,8.54
VIEWBANK,07/08/2025,1.0,0.4, ,13.2,4.0,97,59,0.97,8.54
VIEWBANK,08/08/2025,1.2,0.0, ,15.8,5.9,98,61,1.32,10.95
VIEWBANK,09/08/2025,1.3,0.0, ,17.8,4.7,100,48,0.89,11.73
VIEWBANK,10/08/2025,1.4,0.0, ,18.5,1.6,100,47,0.99,12.74
VIEWBANK,11/08/2025,1.5,0.0, ,18.8,2.0,99,37,1.06,12.47
VIEWBANK,12/08/2025,1.3,0.0, ,15.4,4.0,99,53,1.57,7.92
VIEWBANK,13/08/2025,1.3,7.0, ,14.9,5.8,100,60,1.35,10.42
VIEWBANK,14/08/2025,1.7,0.2, ,19.5,2.5,100,37,1.17,13.28
VIEWBANK,15/08/2025,1.6,0.0, ,14.8,6.2,98,51,3.31,5.87
VIEWBANK,16/08/2025,1.6,9.0, ,12.6,5.7,98,52,2.60,11.63
VIEWBANK,17/08/2025,1.2,1.4, ,13.6,2.1,99,59,1.52,9.49
VIEWBANK,18/08/2025,1.0,0.0, ,11.9,3.4,99,57,1.15,6.94
VIEWBANK,19/08/2025,1.5,0.0, ,17.0,-1.3,100,37,0.97,13.88
VIEWBANK,20/08/2025,1.6,0.0, ,20.1,0.4,96,35,0.81,14.17
VIEWBANK,21/08/2025,1.6,0.2, ,21.7,2.0,96,36,0.59,14.30
VIEWBANK,22/08/2025,1.8,0.0, ,19.4,7.1,92,56,2.03,10.36
VIEWBANK,23/08/2025,1.9,0.0, ,15.8,8.5,96,49,2.13,12.62
VIEWBANK,24/08/2025,1.7,0.0, ,16.2,3.1,98,52,1.55,14.76
Totals:,,34.8,21.8, ,,,,,,
$
It’s a bit messy as the headings are spread over three lines and comma separated. Best to dump file into Excel and concatenate the headings. See example Excel file attached.
The headings are:
Station Name
Date
Evapo-Transpiration0000-2400(mm)
Rain0900-0900(mm)
PanEvaporation0900-0900(mm)
MaximumTemperature(▒C)
MinimumTemperature(▒C)
MaximumRelativeHumidity(%)
MinimumRelativeHumidity(%)
Average10m WindSpeed(m/sec)
SolarRadiation(MJ/sq m)
From the headings we can build a table with column names resembling the headings in the BoM CSV fiiles. I'm also making the primary keys the "station_name" and the date as these are unique identifiers for every row. For traceability, its handy to add a column naming the source file of the data. See column source_filename TEXT In SQL:
CREATE TABLE bom_vic_climate (
station_name TEXT NOT NULL,
date DATE NOT NULL,
evapotranspiration_mm REAL,
rain_mm REAL,
pan_evaporation_mm REAL,
max_temp_c REAL,
min_temp_c REAL,
max_rh_percent REAL,
min_rh_percent REAL,
avg_wind_speed_mps REAL,
radiation_mj_m2 REAL,
source_filename TEXT,
PRIMARY KEY (station_name, date)
);
We’ll be ingesting quite a bit of data so to speed up queries and and presentation panels in Grafana we can create some indexes. In SQL:
CREATE INDEX idx_bom_vic_date ON bom_vic_climate(date);
CREATE INDEX idx_bom_vic_station ON bom_vic_climate(station_name);
Python Ingest Script
I’m learning Python. We’re using three libraries here:
- os – module for access to operating system functionality
- io – for working with streams like files, in-mem buffers, read/write data sources.
- psycopg2 – driver that connects script to PostgreSQL
- pandas – to read, clean, transform and structure tabular data
#!/usr/bin/env python3
import os
import io
import psycopg2
import pandas as pd
# ─────────────────────────────────────────────────────────────
# BOM VIC Climate Data Loader for PostgreSQL
# Author: Bill's Modular Lab / MS Copilot
# Purpose: Parse cleaned BOM CSV files and insert into PostgreSQL
# ─────────────────────────────────────────────────────────────
DATA_DIR = "/home/bill/Projects/BOM/Data"
DB_PARAMS = {
"dbname": "bom",
"user": "postgres",
"host": "localhost",
"port": 5432
}
# 🧹 Clean and normalize BOM file
def clean_file(filepath, filename):
try:
with open(filepath, 'r', encoding='utf-8') as f:
lines = f.readlines()
except UnicodeDecodeError:
print(f"⚠️ UTF-8 failed, retrying with ISO-8859-1: {filepath}")
with open(filepath, 'r', encoding='iso-8859-1') as f:
lines = f.readlines()
# 🔍 Find header line
header_index = next(
(i for i, line in enumerate(lines) if "Station Name" in line and "Date" in line),
None
)
if header_index is None:
raise ValueError(f"No valid header found in {filepath}")
# ✂️ Extract header + data block
clean_lines = lines[header_index:]
# 🧾 Load into DataFrame
df = pd.read_csv(io.StringIO(''.join(clean_lines)), sep=',')
# 🏷️ Rename columns to match DB schema
df.columns = [
'station_name', 'date', 'evapotranspiration_mm', 'rain_mm', 'pan_evaporation_mm',
'max_temp_c', 'min_temp_c', 'max_rh_percent', 'min_rh_percent',
'avg_wind_speed_mps', 'radiation_mj_m2'
]
# 🗓️ Normalize date
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')
df.dropna(subset=['date'], inplace=True)
df = df[df['station_name'].notna()]
# 🔢 Convert numeric fields
for col in df.columns:
if col not in ['station_name', 'date']:
df[col] = pd.to_numeric(df[col], errors='coerce')
# 🧾 Add filename for traceability
df['source_filename'] = filename
return df
# 🧩 Insert rows into PostgreSQL
def insert_to_db(df, conn):
with conn.cursor() as cur:
print(f"🧪 Available columns: {df.columns.tolist()}")
inserted = 0
for _, row in df.iterrows():
try:
cur.execute("""
INSERT INTO bom_vic_climate (
station_name, date, evapotranspiration_mm, rain_mm, pan_evaporation_mm,
max_temp_c, min_temp_c, max_rh_percent, min_rh_percent,
avg_wind_speed_mps, radiation_mj_m2, source_filename
) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)
ON CONFLICT (station_name, date) DO NOTHING;
""", (
row['station_name'], row['date'],
row['evapotranspiration_mm'],
row['rain_mm'],
row['pan_evaporation_mm'],
row['max_temp_c'],
row['min_temp_c'],
row['max_rh_percent'],
row['min_rh_percent'],
row['avg_wind_speed_mps'],
row['radiation_mj_m2'],
row['source_filename']
))
inserted += 1
except Exception as e:
print(f"❌ Insert error: {e}")
print(f"✅ Inserted {inserted} rows.")
conn.commit()
# 🚀 Main loop
def main():
print("🔗 Connecting to database...")
conn = psycopg2.connect(**DB_PARAMS)
print("✅ Connected.")
for station_dir in os.listdir(DATA_DIR):
station_path = os.path.join(DATA_DIR, station_dir)
if os.path.isdir(station_path):
print(f"📂 Processing station: {station_dir}")
for file in os.listdir(station_path):
filepath = os.path.join(station_path, file)
if file.lower().endswith(".csv"):
print(f"📄 Found file: {filepath}")
try:
df = clean_file(filepath, file)
print(f"✅ Parsed {len(df)} rows from {filepath}")
df['station_name'] = station_dir
insert_to_db(df, conn)
except Exception as e:
print(f"❌ Error processing {filepath}: {e}")
else:
print(f"⏭️ Skipping non-CSV file: {file}")
conn.close()
print("🔒 Database connection closed.")
if __name__ == "__main__":
main()
Make script executable
$ chmod 755 02_Load_Bom_Vic_to_DB.py
It worked after many iterations of testing and updating the Python script. Issues with:
- Different encoding standards and unrecognisable text.
- Database credentials like names and passwords
- Differing data file formats from the BOM
- Invisible and malformed characters
- Data headers buried under BOM meta data
Data Quality and Dignostics
Rows missing metrics
Test every BOM station name to see how many rows of data it has and what was the last recorded data entry date. Sorted by oldest date first to highlight worst performing data.
SELECT
station_name,
MAX(date) AS last_observation_date,
COUNT(*) AS total_rows
FROM bom_vic_climate
GROUP BY station_name
ORDER BY last_observation_date, station_name
;
Output…
station_name | last_observation_date | total_rows
-------------------------------------+-----------------------+------------
grovedale_(geelong_airport) | 2011-06-01 | 882
ferny_creek_(dunns_hill) | 2011-07-25 | 936
kyabram_dpi | 2012-03-18 | 1142
viewbank_(arpansa) | 2012-03-18 | 1173
melbourne_(150_lonsdale_st) | 2013-06-01 | 32
melbourne_(olympic_park)_comparison | 2014-01-27 | 241
melbourne_regional_office | 2015-01-05 | 2196
st_kilda_harbour_-_rmys | 2016-05-20 | 20
alice_springs_airport | 2017-03-14 | 14
glenlitta_avenue_-_micromac_site | 2017-11-24 | 46
east_sale | 2018-03-24 | 200
portland_ntc | 2018-03-24 | 2531
frankston | 2018-08-14 | 3507
albury_airport | 2019-09-29 | 60
deniliquin_airport | 2019-09-29 | 60
coldstream_comparison | 2020-11-05 | 4101
sheoaks | 2022-08-06 | 4959
portland_(cashmore_airport) | 2023-05-06 | 5223
corner_inlet_(yanakie) | 2025-07-03 | 4523
kanagulk | 2025-07-03 | 6003
aireys_inlet | 2025-08-24 | 6073
avalon_airport | 2025-08-24 | 6075
bairnsdale_airport | 2025-08-24 | 6074
ballarat_aerodrome | 2025-08-24 | 6069
bendigo_airport | 2025-08-24 | 6061
breakwater_(geelong_racecourse) | 2025-08-24 | 5189
cape_nelson_lighthouse | 2025-08-24 | 6058
cape_otway_lighthouse | 2025-08-24 | 6068
casterton | 2025-08-24 | 6058
cerberus | 2025-08-24 | 6059
charlton | 2025-08-24 | 6074
colac_(mount_gellibrand) | 2025-08-24 | 6056
coldstream | 2025-08-24 | 6075
combienbar | 2025-08-24 | 6040
dartmoor | 2025-08-24 | 6008
dinner_plain_(mount_hotham_airport) | 2025-08-24 | 6061
east_sale_airport | 2025-08-24 | 5811
edenhope_airport | 2025-08-24 | 6066
eildon_fire_tower | 2025-08-24 | 6075
essendon_airport | 2025-08-24 | 6075
falls_creek | 2025-08-24 | 6013
ferny_creek | 2025-08-24 | 5285
frankston_(ballam_park) | 2025-08-24 | 1881
gabo_island_lighthouse | 2025-08-24 | 6066
gelantipy | 2025-08-24 | 6051
grampians_(mount_william) | 2025-08-24 | 5942
hamilton_airport | 2025-08-24 | 6068
hopetoun_airport | 2025-08-24 | 6062
horsham_aerodrome | 2025-08-24 | 6073
hunters_hill | 2025-08-24 | 6015
kellalac_(warracknabeal_airport) | 2025-08-24 | 3178
kerang_airport | 2025-08-24 | 55
kyabram | 2025-08-24 | 4912
laverton_raaf | 2025-08-24 | 6074
longerenong | 2025-08-24 | 6066
mallacoota | 2025-08-24 | 6060
mangalore_airport | 2025-08-24 | 6073
melbourne_airport | 2025-08-24 | 6075
melbourne_(olympic_park) | 2025-08-24 | 4249
mildura_airport | 2025-08-24 | 6074
moorabbin_airport | 2025-08-24 | 6075
mortlake_racecourse | 2025-08-24 | 6075
morwell_(latrobe_valley_airport) | 2025-08-24 | 6069
mount_baw_baw | 2025-08-24 | 5936
mount_buller | 2025-08-24 | 6005
mount_hotham | 2025-08-24 | 5703
mount_moornapa | 2025-08-24 | 6049
mount_nowa_nowa | 2025-08-24 | 6066
nhill_aerodrome | 2025-08-24 | 6069
omeo | 2025-08-24 | 6072
orbost | 2025-08-24 | 6059
point_cook_raaf | 2025-08-24 | 1360
port_fairy | 2025-08-24 | 6074
portland_airport | 2025-08-24 | 847
pound_creek | 2025-08-24 | 6032
puckapunyal_lyon_hill_(defence) | 2025-08-24 | 2520
puckapunyal_west_(defence) | 2025-08-24 | 2520
pyrenees_(ben_nevis) | 2025-08-24 | 6042
redesdale | 2025-08-24 | 6064
rhyll | 2025-08-24 | 6072
rutherglen_research | 2025-08-24 | 6062
scoresby_research_institute | 2025-08-24 | 6068
she_oaks | 2025-08-24 | 1120
shepparton_airport | 2025-08-24 | 6071
stawell_aerodrome | 2025-08-24 | 6067
swan_hill_aerodrome | 2025-08-24 | 6073
tatura_inst_sustainable_ag | 2025-08-24 | 5480
viewbank | 2025-08-24 | 4920
wallan_(kilmore_gap) | 2025-08-24 | 6075
walpeup_research | 2025-08-24 | 6072
wangaratta_aero | 2025-08-24 | 6053
warrnambool_airport_ndb | 2025-08-24 | 6074
westmere | 2025-08-24 | 6059
wilsons_promontory_lighthouse | 2025-08-24 | 6035
yarram_airport | 2025-08-24 | 6072
yarrawonga | 2025-08-24 | 6063
(96 rows)
Future Dates
Check for broken clocks at BOM sites stamping future date and time against collected metrics.
SELECT *
FROM bom_vic_climate
WHERE date > CURRENT_DATE;
Output…
station_name | date | evapotranspiration_mm | rain_mm | pan_evaporation_mm | max_temp_c | min_temp_c | max_rh_percent | min_rh_percent | avg_wind_speed_mps | radiation_mj_m2 | source_filename
--------------+------+-----------------------+---------+--------------------+------------+------------+----------------+----------------+--------------------+-----------------+-----------------
(0 rows)
Data File Check
Check lines of data for each BOM data file.
SELECT
source_filename,
COUNT(*) AS row_count
FROM bom_vic_climate
GROUP BY source_filename
ORDER BY row_count DESC,source_filename asc
;
Output…
source_filename | row_count
------------------------------------------------+-----------
aireys_inlet-200901.csv | 31
aireys_inlet-200903.csv | 31
aireys_inlet-200905.csv | 31
aireys_inlet-200907.csv | 31
...
...
dinner_plain_(mount_hotham_airport)-202503.csv | 31
dinner_plain_(mount_hotham_airport)-202505.csv | 31
dinner_plain_(mount_hotham_airport)-202507.csv | 31
east_sale-201710.csv | 31
...
...
mount_hotham-201506.csv | 12
pound_creek-202310.csv | 11
pyrenees_(ben_nevis)-201602.csv | 11
wilsons_promontory_lighthouse-201201.csv | 8
east_sale_airport-201709.csv | 7
glenlitta_avenue_-_micromac_site-201408.csv | 7
grampians_(mount_william)-201806.csv | 7
grovedale_(geelong_airport)-201106.csv | 7
kanagulk-202507.csv | 7
melbourne_(150_lonsdale_st)-201306.csv | 7
mount_hotham-202309.csv | 7
portland_(cashmore_airport)-202305.csv | 7
sheoaks-202208.csv | 7
coldstream_comparison-202011.csv | 5
melbourne_regional_office-201501.csv | 5
corner_inlet_(yanakie)-202507.csv | 3
mount_hotham-202401.csv | 2
(15071 rows)
Null / NaN Count Check by Column
This query check to see what column have the most amount of Null values (renamed as NaM in the Pythan PANDA process, short for Not a Number). If you’re looking for “Null” value change “= 'NaN'” to “IS NULL“.
SELECT
COUNT(*) FILTER (WHERE evapotranspiration_mm = 'NaN') AS evapotranspiration_nulls,
COUNT(*) FILTER (WHERE rain_mm = 'NaN') AS rain_nulls,
COUNT(*) FILTER (WHERE pan_evaporation_mm = 'NaN') AS pan_evaporation_nulls,
COUNT(*) FILTER (WHERE max_temp_c = 'NaN') AS max_temp_nulls,
COUNT(*) FILTER (WHERE min_temp_c = 'NaN') AS min_temp_nulls,
COUNT(*) FILTER (WHERE max_rh_percent = 'NaN') AS max_rh_nulls,
COUNT(*) FILTER (WHERE min_rh_percent = 'NaN') AS min_rh_nulls,
COUNT(*) FILTER (WHERE avg_wind_speed_mps = 'NaN') AS wind_nulls,
COUNT(*) FILTER (WHERE radiation_mj_m2 = 'NaN') AS radiation_nulls,
COUNT(*) FILTER (WHERE source_filename = 'NaN') AS filename_nulls
FROM bom_vic_climate
;
Output
evapotranspiration_nulls | rain_nulls | pan_evaporation_nulls | max_temp_nulls | min_temp_nulls | max_rh_nulls | min_rh_nulls | wind_nulls | radiation_nulls | filename_nulls
--------------------------+------------+-----------------------+----------------+----------------+--------------+--------------+------------+-----------------+----------------
13848 | 12490 | 440510 | 6173 | 5320 | 1863 | 1863 | 1863 | 4528 | 0
(1 row)
Null / NaN Count Check by Column and Station Name
Same as the above but introducing the station name.
SELECT
station_name,
COUNT(*) FILTER (WHERE rain_mm = 'NaN') AS rain_nulls,
COUNT(*) FILTER (WHERE pan_evaporation_mm = 'NaN') AS pan_evaporation_nulls,
COUNT(*) FILTER (WHERE max_temp_c = 'NaN') AS max_temp_nulls,
COUNT(*) FILTER (WHERE min_temp_c = 'NaN') AS min_temp_nulls,
COUNT(*) FILTER (WHERE max_rh_percent = 'NaN') AS max_rh_nulls,
COUNT(*) FILTER (WHERE min_rh_percent = 'NaN') AS min_rh_nulls,
COUNT(*) FILTER (WHERE avg_wind_speed_mps = 'NaN') AS wind_nulls,
COUNT(*) FILTER (WHERE radiation_mj_m2 = 'NaN') AS radiation_nulls,
COUNT(*) FILTER (WHERE source_filename = 'NaN') AS filename_nulls
FROM bom_vic_climate
GROUP BY station_name
ORDER BY station_name
;
Output…
station_name | rain_nulls | pan_evaporation_nulls | max_temp_nulls | min_temp_nulls | max_rh_nulls | min_rh_nulls | wind_nulls | radiation_nulls | filename_nulls
-------------------------------------+------------+-----------------------+----------------+----------------+--------------+--------------+------------+-----------------+----------------
aireys_inlet | 61 | 6073 | 32 | 28 | 18 | 18 | 18 | 43 | 0
albury_airport | 0 | 60 | 0 | 0 | 0 | 0 | 0 | 0 | 0
alice_springs_airport | 0 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0
avalon_airport | 47 | 6075 | 49 | 43 | 2 | 2 | 2 | 43 | 0
bairnsdale_airport | 25 | 6074 | 27 | 13 | 11 | 11 | 11 | 43 | 0
ballarat_aerodrome | 25 | 6069 | 33 | 18 | 2 | 2 | 2 | 44 | 0
bendigo_airport | 36 | 6061 | 34 | 25 | 12 | 12 | 12 | 43 | 0
breakwater_(geelong_racecourse) | 68 | 5189 | 111 | 83 | 14 | 14 | 14 | 15 | 0
cape_nelson_lighthouse | 19 | 6058 | 21 | 12 | 3 | 3 | 3 | 42 | 0
cape_otway_lighthouse | 71 | 6068 | 74 | 60 | 20 | 20 | 20 | 43 | 0
casterton | 29 | 6058 | 31 | 28 | 18 | 18 | 18 | 42 | 0
cerberus | 32 | 6059 | 75 | 59 | 15 | 15 | 15 | 42 | 0
charlton | 39 | 6074 | 24 | 16 | 8 | 8 | 8 | 42 | 0
colac_(mount_gellibrand) | 155 | 6056 | 53 | 50 | 12 | 12 | 12 | 42 | 0
coldstream | 30 | 6075 | 26 | 19 | 15 | 15 | 15 | 42 | 0
coldstream_comparison | 157 | 4101 | 147 | 139 | 84 | 84 | 84 | 49 | 0
...
...
wilsons_promontory_lighthouse | 114 | 6035 | 104 | 91 | 50 | 50 | 50 | 42 | 0
yarram_airport | 24 | 6072 | 20 | 15 | 10 | 10 | 10 | 42 | 0
yarrawonga | 99 | 6063 | 106 | 88 | 44 | 44 | 44 | 42 | 0
(96 rows)