Environment Data Monitoring (BOM)

Purpose

The purpose of this project is to ingest publicly available BOM environment data, load it into a database and present the findings using a BI baseboard. It must update daily and by checking data sources.

Its also designed to showcase my skills as a data analyst with skills in Unix/Linux, databases, BI tools, Excel and

Scope

Audience

< Show diagram – The plan >

Identify data source and where files are kept. Look for dictionary and online help.

Link to BOM sources:

https://www.bom.gov.au/catalogue/anon-ftp.shtml

https://www.bom.gov.au/water/regulations/dataDelivery/howto/ftpAccount.shtml

Shell Script to Download Data

Below is a bash script to perform the recursive download of all CSV files from all BOM sites. I’m limiting myself to VIC BOM sites, but the shell script can be expanded to capture all available sites. You may need to download the “lftp” FTP utility and change the PATH names to match your setup. Its safe to run this multiple times as the --only-newer flag compares files and downloads anything not stored locally.

#!/bin/bash

# ─────────────────────────────────────────────────────────────
# BOM VIC Climate Data Ingestion Script (via lftp)
# Author: Bill K / MS Copilot
# Purpose: Mirror VIC climate data from BOM FTP
# ─────────────────────────────────────────────────────────────

TARGET_DIR="/home/bill/Projects/BOM/Data"
LOGFILE="/home/bill/Projects/BOM/bom_lftp_vic.log"

mkdir -p "$TARGET_DIR"
touch "$LOGFILE"

echo "[$(date)] Starting BOM VIC ingestion..." | tee -a "$LOGFILE"

# 🔌 Connect and mirror using lftp
lftp -u anonymous,anonymous ftp.bom.gov.au <<EOF
set ftp:ssl-allow no

# Navigate to VIC climate data directory - modify this for other states
cd anon/gen/clim_data/IDCKWCDEA0/tables/vic

# Mirror all files on first run, then only newer ones
mirror --verbose \
       --only-newer \
       --parallel=2 \
       --use-pget-n=4 \
       . "$TARGET_DIR"

bye
EOF

echo "[$(date)] BOM VIC ingestion complete." | tee -a "$LOGFILE"

Make script executable

$ chmod 755 /home/bill/Projects/BOM/01_Bom_Vic_Ingest.sh

Run script.

$ ./01_Bom_Vic_Ingest.sh

The output is quite verbose, and you’ll see output like this

...
...
Transferring file `walpeup_research/walpeup_research-202108.csv'
Transferring file `walpeup_research/walpeup_research-202109.csv'
Transferring file `walpeup_research/walpeup_research-202110.csv'
Transferring file `wangaratta_aero/wangaratta_aero-200901.csv'
Transferring file `walpeup_research/walpeup_research-202111.csv'
Transferring file `wangaratta_aero/wangaratta_aero-200902.csv'
Transferring file `walpeup_research/walpeup_research-202112.csv'
Transferring file `wangaratta_aero/wangaratta_aero-200903.csv'
Transferring file `walpeup_research/walpeup_research-202201.csv'
Transferring file `wangaratta_aero/wangaratta_aero-200904.csv'
Transferring file `walpeup_research/walpeup_research-202202.csv'
Transferring file `wangaratta_aero/wangaratta_aero-200905.csv'
...
...

Note below in the Data directory that the newly created directories with special characters, like brackets get treated with single quotes by the ls command.

$ ls -l
total 1132
drwxr-xr-x 2 bill bill 12288 Aug 26 08:37 aireys_inlet
drwxr-xr-x 2 bill bill 4096 Aug 26 08:41 albury_airport
drwxr-xr-x 2 bill bill 4096 Aug 26 08:39 alice_springs_airport
drwxr-xr-x 2 bill bill 12288 Aug 26 08:37 avalon_airport
drwxr-xr-x 2 bill bill 16384 Aug 26 08:42 bairnsdale_airport
drwxr-xr-x 2 bill bill 16384 Aug 26 08:39 ballarat_aerodrome
drwxr-xr-x 2 bill bill 12288 Aug 26 08:42 bendigo_airport
drwxr-xr-x 2 bill bill 20480 Aug 26 08:41 'breakwater_(geelong_racecourse)'
drwxr-xr-x 2 bill bill 20480 Aug 26 08:38 cape_nelson_lighthouse

Lets create a cron job so this tasks gets updated daily.

$ crontab -e

Modify this line to match your configuration and add it to the contab. Here’s mine. It runs a 6:30am every day appending its output, both standard out and standard error to the cron_ingest.log file. Late edition I also added the date command for better traceability which runs first and then complete (i.e &&) then run ingest script.

30 6 * * * date >> /home/bill/Projects/BOM/cron_ingest.log && /home/bill/Projects/BOM/01_Bom_Vic_Ingest.sh >> /home/bill/Projects/BOM/cron_ingest.log 2>&1

Database Configuration and Ingest Script

I’m using PostgreSQL for my industry project although I could have used any freely available database that runs on Raspberry Pi’s Debian based Pi OS.

Check existing databases to see if I need to create a new database or wether I have one ready to go.

In pgadmin on my Windows laptop I run this SQL command from the query window to output all database created.

SELECT datname
FROM pg_database
WHERE datistemplate = false
;

Output

"datname"
"postgres"
"nem"
"svr"
"fin"
"bom"
"iot"
"abs"

Lets use the BOM database and we’ll build some new tables to contain out data.

DBTables

Let’s create the tables based on the file structure of the datafiles we downloaded. Example file:

BoM DataFile Headings Download

$ cat viewbank-202508.csv
'IDCKWCDE61,,,,,,,,,,
Australian Government Bureau of Meteorology,,,,,,,,,,
South Australia,,,,,,,,,,

Daily Evapotranspiration for VIEWBANK Victoria for August 2025,,,,,,,,,,
Issued at 22:31 GMT on Monday 25 August 2025,,,,,,,,,,
 Copyright Commonwealth of Australia 2025 Bureau of Meteorology (ABN 92 637 533 532),,,,,,,,,,
Please note Copyright Disclaimer and Privacy Notice <http://www.bom.gov.au/other/copyright.shtml>,,,,,,,,,,

,,Evapo-,,Pan,,,Maximum,Minimum,Average,
,,Transpiration,Rain,Evaporation,Maximum,Minimum,Relative,Relative,10m Wind,Solar
Station Name,Date,0000-2400,0900-0900,0900-0900,Temperature,Temperature,Humidity,Humidity,Speed,Radiation
,,(mm),(mm),(mm),(C),(C),(%),(%),(m/sec),(MJ/sq m)
VIEWBANK,01/08/2025,1.0,0.0, ,14.3,0.3,100,63,1.15,9.29
VIEWBANK,02/08/2025,1.0,0.0, ,16.1,3.6,100,59,0.62,11.32
VIEWBANK,03/08/2025,1.6,0.0, ,19.7,0.4,100,38,1.45,11.85
VIEWBANK,04/08/2025,1.6,0.0, ,18.1,3.3,94,52,2.34,8.81
VIEWBANK,05/08/2025,1.9,3.4, ,16.5,8.5,91,48,3.30,8.38
VIEWBANK,06/08/2025,1.5,0.2, ,15.4,9.3,90,58,2.52,8.54
VIEWBANK,07/08/2025,1.0,0.4, ,13.2,4.0,97,59,0.97,8.54
VIEWBANK,08/08/2025,1.2,0.0, ,15.8,5.9,98,61,1.32,10.95
VIEWBANK,09/08/2025,1.3,0.0, ,17.8,4.7,100,48,0.89,11.73
VIEWBANK,10/08/2025,1.4,0.0, ,18.5,1.6,100,47,0.99,12.74
VIEWBANK,11/08/2025,1.5,0.0, ,18.8,2.0,99,37,1.06,12.47
VIEWBANK,12/08/2025,1.3,0.0, ,15.4,4.0,99,53,1.57,7.92
VIEWBANK,13/08/2025,1.3,7.0, ,14.9,5.8,100,60,1.35,10.42
VIEWBANK,14/08/2025,1.7,0.2, ,19.5,2.5,100,37,1.17,13.28
VIEWBANK,15/08/2025,1.6,0.0, ,14.8,6.2,98,51,3.31,5.87
VIEWBANK,16/08/2025,1.6,9.0, ,12.6,5.7,98,52,2.60,11.63
VIEWBANK,17/08/2025,1.2,1.4, ,13.6,2.1,99,59,1.52,9.49
VIEWBANK,18/08/2025,1.0,0.0, ,11.9,3.4,99,57,1.15,6.94
VIEWBANK,19/08/2025,1.5,0.0, ,17.0,-1.3,100,37,0.97,13.88
VIEWBANK,20/08/2025,1.6,0.0, ,20.1,0.4,96,35,0.81,14.17
VIEWBANK,21/08/2025,1.6,0.2, ,21.7,2.0,96,36,0.59,14.30
VIEWBANK,22/08/2025,1.8,0.0, ,19.4,7.1,92,56,2.03,10.36
VIEWBANK,23/08/2025,1.9,0.0, ,15.8,8.5,96,49,2.13,12.62
VIEWBANK,24/08/2025,1.7,0.0, ,16.2,3.1,98,52,1.55,14.76
Totals:,,34.8,21.8, ,,,,,,
$

It’s a bit messy as the headings are spread over three lines and comma separated. Best to dump file into Excel and concatenate the headings. See example Excel file attached.

The headings are:

Station Name
Date
Evapo-Transpiration0000-2400(mm)
Rain0900-0900(mm)
PanEvaporation0900-0900(mm)
MaximumTemperature(▒C)
MinimumTemperature(▒C)
MaximumRelativeHumidity(%)
MinimumRelativeHumidity(%)
Average10m WindSpeed(m/sec)
SolarRadiation(MJ/sq m)

From the headings we can build a table with column names resembling the headings in the BoM CSV fiiles. I'm also making the primary keys the "station_name" and the date as these are unique identifiers for every row. For traceability, its handy to add a column naming the source file of the data. See column source_filename TEXT In SQL:

CREATE TABLE bom_vic_climate (
    station_name TEXT NOT NULL,
    date DATE NOT NULL,
    evapotranspiration_mm REAL,
    rain_mm REAL,
    pan_evaporation_mm REAL,
    max_temp_c REAL,
    min_temp_c REAL,
    max_rh_percent REAL,
    min_rh_percent REAL,
    avg_wind_speed_mps REAL,
    radiation_mj_m2 REAL,
    source_filename TEXT,
    PRIMARY KEY (station_name, date)
);

We’ll be ingesting quite a bit of data so to speed up queries and and presentation panels in Grafana we can create some indexes. In SQL:

CREATE INDEX idx_bom_vic_date ON bom_vic_climate(date);
CREATE INDEX idx_bom_vic_station ON bom_vic_climate(station_name);

Python Ingest Script

I’m learning Python. We’re using three libraries here:

os – module for access to operating system functionality
io – for working with streams like files, in-mem buffers, read/write data sources.
psycopg2 – driver that connects script to PostgreSQL
pandas – to read, clean, transform and structure tabular data

#!/usr/bin/env python3

import os
import io
import psycopg2
import pandas as pd

# ─────────────────────────────────────────────────────────────
# BOM VIC Climate Data Loader for PostgreSQL
# Author: Bill's Modular Lab / MS Copilot
# Purpose: Parse cleaned BOM CSV files and insert into PostgreSQL
# ─────────────────────────────────────────────────────────────

DATA_DIR = "/home/bill/Projects/BOM/Data"

DB_PARAMS = {
    "dbname": "bom",
    "user": "postgres",
    "host": "localhost",
    "port": 5432
}

# 🧹 Clean and normalize BOM file
def clean_file(filepath, filename):
    try:
        with open(filepath, 'r', encoding='utf-8') as f:
            lines = f.readlines()
    except UnicodeDecodeError:
        print(f"⚠️ UTF-8 failed, retrying with ISO-8859-1: {filepath}")
        with open(filepath, 'r', encoding='iso-8859-1') as f:
            lines = f.readlines()

    # 🔍 Find header line
    header_index = next(
        (i for i, line in enumerate(lines) if "Station Name" in line and "Date" in line),
        None
    )
    if header_index is None:
        raise ValueError(f"No valid header found in {filepath}")

    # ✂️ Extract header + data block
    clean_lines = lines[header_index:]

    # 🧾 Load into DataFrame
    df = pd.read_csv(io.StringIO(''.join(clean_lines)), sep=',')

    # 🏷️ Rename columns to match DB schema
    df.columns = [
        'station_name', 'date', 'evapotranspiration_mm', 'rain_mm', 'pan_evaporation_mm',
        'max_temp_c', 'min_temp_c', 'max_rh_percent', 'min_rh_percent',
        'avg_wind_speed_mps', 'radiation_mj_m2'
    ]

    # 🗓️ Normalize date
    df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')
    df.dropna(subset=['date'], inplace=True)
    df = df[df['station_name'].notna()]

    # 🔢 Convert numeric fields
    for col in df.columns:
        if col not in ['station_name', 'date']:
            df[col] = pd.to_numeric(df[col], errors='coerce')

    # 🧾 Add filename for traceability
    df['source_filename'] = filename

    return df

# 🧩 Insert rows into PostgreSQL
def insert_to_db(df, conn):
    with conn.cursor() as cur:
        print(f"🧪 Available columns: {df.columns.tolist()}")
        inserted = 0
        for _, row in df.iterrows():
            try:
                cur.execute("""
                    INSERT INTO bom_vic_climate (
                        station_name, date, evapotranspiration_mm, rain_mm, pan_evaporation_mm,
                        max_temp_c, min_temp_c, max_rh_percent, min_rh_percent,
                        avg_wind_speed_mps, radiation_mj_m2, source_filename
                    ) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)
                    ON CONFLICT (station_name, date) DO NOTHING;
                """, (
                    row['station_name'], row['date'],
                    row['evapotranspiration_mm'],
                    row['rain_mm'],
                    row['pan_evaporation_mm'],
                    row['max_temp_c'],
                    row['min_temp_c'],
                    row['max_rh_percent'],
                    row['min_rh_percent'],
                    row['avg_wind_speed_mps'],
                    row['radiation_mj_m2'],
                    row['source_filename']
                ))
                inserted += 1
            except Exception as e:
                print(f"❌ Insert error: {e}")
        print(f"✅ Inserted {inserted} rows.")
        conn.commit()

# 🚀 Main loop
def main():
    print("🔗 Connecting to database...")
    conn = psycopg2.connect(**DB_PARAMS)
    print("✅ Connected.")

    for station_dir in os.listdir(DATA_DIR):
        station_path = os.path.join(DATA_DIR, station_dir)
        if os.path.isdir(station_path):
            print(f"📂 Processing station: {station_dir}")
            for file in os.listdir(station_path):
                filepath = os.path.join(station_path, file)
                if file.lower().endswith(".csv"):
                    print(f"📄 Found file: {filepath}")
                    try:
                        df = clean_file(filepath, file)
                        print(f"✅ Parsed {len(df)} rows from {filepath}")
                        df['station_name'] = station_dir
                        insert_to_db(df, conn)
                    except Exception as e:
                        print(f"❌ Error processing {filepath}: {e}")
                else:
                    print(f"⏭️ Skipping non-CSV file: {file}")

    conn.close()
    print("🔒 Database connection closed.")

if __name__ == "__main__":
    main()

Make script executable

$ chmod 755 02_Load_Bom_Vic_to_DB.py

It worked after many iterations of testing and updating the Python script. Issues with:

Different encoding standards and unrecognisable text.
Database credentials like names and passwords
Differing data file formats from the BOM
Invisible and malformed characters
Data headers buried under BOM meta data

Data Quality and Dignostics

Rows missing metrics

Test every BOM station name to see how many rows of data it has and what was the last recorded data entry date. Sorted by oldest date first to highlight worst performing data.

SELECT
  station_name,
  MAX(date) AS last_observation_date,
  COUNT(*) AS total_rows
FROM bom_vic_climate
GROUP BY station_name
ORDER BY last_observation_date, station_name
;

Output…

            station_name             | last_observation_date | total_rows
-------------------------------------+-----------------------+------------
 grovedale_(geelong_airport)         | 2011-06-01            |        882
 ferny_creek_(dunns_hill)            | 2011-07-25            |        936
 kyabram_dpi                         | 2012-03-18            |       1142
 viewbank_(arpansa)                  | 2012-03-18            |       1173
 melbourne_(150_lonsdale_st)         | 2013-06-01            |         32
 melbourne_(olympic_park)_comparison | 2014-01-27            |        241
 melbourne_regional_office           | 2015-01-05            |       2196
 st_kilda_harbour_-_rmys             | 2016-05-20            |         20
 alice_springs_airport               | 2017-03-14            |         14
 glenlitta_avenue_-_micromac_site    | 2017-11-24            |         46
 east_sale                           | 2018-03-24            |        200
 portland_ntc                        | 2018-03-24            |       2531
 frankston                           | 2018-08-14            |       3507
 albury_airport                      | 2019-09-29            |         60
 deniliquin_airport                  | 2019-09-29            |         60
 coldstream_comparison               | 2020-11-05            |       4101
 sheoaks                             | 2022-08-06            |       4959
 portland_(cashmore_airport)         | 2023-05-06            |       5223
 corner_inlet_(yanakie)              | 2025-07-03            |       4523
 kanagulk                            | 2025-07-03            |       6003
 aireys_inlet                        | 2025-08-24            |       6073
 avalon_airport                      | 2025-08-24            |       6075
 bairnsdale_airport                  | 2025-08-24            |       6074
 ballarat_aerodrome                  | 2025-08-24            |       6069
 bendigo_airport                     | 2025-08-24            |       6061
 breakwater_(geelong_racecourse)     | 2025-08-24            |       5189
 cape_nelson_lighthouse              | 2025-08-24            |       6058
 cape_otway_lighthouse               | 2025-08-24            |       6068
 casterton                           | 2025-08-24            |       6058
 cerberus                            | 2025-08-24            |       6059
 charlton                            | 2025-08-24            |       6074
 colac_(mount_gellibrand)            | 2025-08-24            |       6056
 coldstream                          | 2025-08-24            |       6075
 combienbar                          | 2025-08-24            |       6040
 dartmoor                            | 2025-08-24            |       6008
 dinner_plain_(mount_hotham_airport) | 2025-08-24            |       6061
 east_sale_airport                   | 2025-08-24            |       5811
 edenhope_airport                    | 2025-08-24            |       6066
 eildon_fire_tower                   | 2025-08-24            |       6075
 essendon_airport                    | 2025-08-24            |       6075
 falls_creek                         | 2025-08-24            |       6013
 ferny_creek                         | 2025-08-24            |       5285
 frankston_(ballam_park)             | 2025-08-24            |       1881
 gabo_island_lighthouse              | 2025-08-24            |       6066
 gelantipy                           | 2025-08-24            |       6051
 grampians_(mount_william)           | 2025-08-24            |       5942
 hamilton_airport                    | 2025-08-24            |       6068
 hopetoun_airport                    | 2025-08-24            |       6062
 horsham_aerodrome                   | 2025-08-24            |       6073
 hunters_hill                        | 2025-08-24            |       6015
 kellalac_(warracknabeal_airport)    | 2025-08-24            |       3178
 kerang_airport                      | 2025-08-24            |         55
 kyabram                             | 2025-08-24            |       4912
 laverton_raaf                       | 2025-08-24            |       6074
 longerenong                         | 2025-08-24            |       6066
 mallacoota                          | 2025-08-24            |       6060
 mangalore_airport                   | 2025-08-24            |       6073
 melbourne_airport                   | 2025-08-24            |       6075
 melbourne_(olympic_park)            | 2025-08-24            |       4249
 mildura_airport                     | 2025-08-24            |       6074
 moorabbin_airport                   | 2025-08-24            |       6075
 mortlake_racecourse                 | 2025-08-24            |       6075
 morwell_(latrobe_valley_airport)    | 2025-08-24            |       6069
 mount_baw_baw                       | 2025-08-24            |       5936
 mount_buller                        | 2025-08-24            |       6005
 mount_hotham                        | 2025-08-24            |       5703
 mount_moornapa                      | 2025-08-24            |       6049
 mount_nowa_nowa                     | 2025-08-24            |       6066
 nhill_aerodrome                     | 2025-08-24            |       6069
 omeo                                | 2025-08-24            |       6072
 orbost                              | 2025-08-24            |       6059
 point_cook_raaf                     | 2025-08-24            |       1360
 port_fairy                          | 2025-08-24            |       6074
 portland_airport                    | 2025-08-24            |        847
 pound_creek                         | 2025-08-24            |       6032
 puckapunyal_lyon_hill_(defence)     | 2025-08-24            |       2520
 puckapunyal_west_(defence)          | 2025-08-24            |       2520
 pyrenees_(ben_nevis)                | 2025-08-24            |       6042
 redesdale                           | 2025-08-24            |       6064
 rhyll                               | 2025-08-24            |       6072
 rutherglen_research                 | 2025-08-24            |       6062
 scoresby_research_institute         | 2025-08-24            |       6068
 she_oaks                            | 2025-08-24            |       1120
 shepparton_airport                  | 2025-08-24            |       6071
 stawell_aerodrome                   | 2025-08-24            |       6067
 swan_hill_aerodrome                 | 2025-08-24            |       6073
 tatura_inst_sustainable_ag          | 2025-08-24            |       5480
 viewbank                            | 2025-08-24            |       4920
 wallan_(kilmore_gap)                | 2025-08-24            |       6075
 walpeup_research                    | 2025-08-24            |       6072
 wangaratta_aero                     | 2025-08-24            |       6053
 warrnambool_airport_ndb             | 2025-08-24            |       6074
 westmere                            | 2025-08-24            |       6059
 wilsons_promontory_lighthouse       | 2025-08-24            |       6035
 yarram_airport                      | 2025-08-24            |       6072
 yarrawonga                          | 2025-08-24            |       6063
(96 rows)

Future Dates

Check for broken clocks at BOM sites stamping future date and time against collected metrics.

SELECT *
FROM bom_vic_climate
WHERE date > CURRENT_DATE;

Output…

 station_name | date | evapotranspiration_mm | rain_mm | pan_evaporation_mm | max_temp_c | min_temp_c | max_rh_percent | min_rh_percent | avg_wind_speed_mps | radiation_mj_m2 | source_filename
--------------+------+-----------------------+---------+--------------------+------------+------------+----------------+----------------+--------------------+-----------------+-----------------
(0 rows)

Data File Check

Check lines of data for each BOM data file.

SELECT
  source_filename,
  COUNT(*) AS row_count
FROM bom_vic_climate
GROUP BY source_filename
ORDER BY row_count DESC,source_filename asc
;

Output…

             source_filename                 | row_count
------------------------------------------------+-----------
 aireys_inlet-200901.csv                        |        31
 aireys_inlet-200903.csv                        |        31
 aireys_inlet-200905.csv                        |        31
 aireys_inlet-200907.csv                        |        31
...
...
 dinner_plain_(mount_hotham_airport)-202503.csv |        31
 dinner_plain_(mount_hotham_airport)-202505.csv |        31
 dinner_plain_(mount_hotham_airport)-202507.csv |        31
 east_sale-201710.csv                           |        31
...
...
 mount_hotham-201506.csv                        |        12
 pound_creek-202310.csv                         |        11
 pyrenees_(ben_nevis)-201602.csv                |        11
 wilsons_promontory_lighthouse-201201.csv       |         8
 east_sale_airport-201709.csv                   |         7
 glenlitta_avenue_-_micromac_site-201408.csv    |         7
 grampians_(mount_william)-201806.csv           |         7
 grovedale_(geelong_airport)-201106.csv         |         7
 kanagulk-202507.csv                            |         7
 melbourne_(150_lonsdale_st)-201306.csv         |         7
 mount_hotham-202309.csv                        |         7
 portland_(cashmore_airport)-202305.csv         |         7
 sheoaks-202208.csv                             |         7
 coldstream_comparison-202011.csv               |         5
 melbourne_regional_office-201501.csv           |         5
 corner_inlet_(yanakie)-202507.csv              |         3
 mount_hotham-202401.csv                        |         2
(15071 rows)

Null / NaN Count Check by Column

This query check to see what column have the most amount of Null values (renamed as NaM in the Pythan PANDA process, short for Not a Number). If you’re looking for “Null” value change “= 'NaN'” to “IS NULL“.

SELECT
  COUNT(*) FILTER (WHERE evapotranspiration_mm = 'NaN') AS evapotranspiration_nulls,
  COUNT(*) FILTER (WHERE rain_mm = 'NaN') AS rain_nulls,
  COUNT(*) FILTER (WHERE pan_evaporation_mm = 'NaN') AS pan_evaporation_nulls,
  COUNT(*) FILTER (WHERE max_temp_c = 'NaN') AS max_temp_nulls,
  COUNT(*) FILTER (WHERE min_temp_c = 'NaN') AS min_temp_nulls,
  COUNT(*) FILTER (WHERE max_rh_percent = 'NaN') AS max_rh_nulls,
  COUNT(*) FILTER (WHERE min_rh_percent = 'NaN') AS min_rh_nulls,
  COUNT(*) FILTER (WHERE avg_wind_speed_mps = 'NaN') AS wind_nulls,
  COUNT(*) FILTER (WHERE radiation_mj_m2 = 'NaN') AS radiation_nulls,
  COUNT(*) FILTER (WHERE source_filename = 'NaN') AS filename_nulls
FROM bom_vic_climate
;

Output

 evapotranspiration_nulls | rain_nulls | pan_evaporation_nulls | max_temp_nulls | min_temp_nulls | max_rh_nulls | min_rh_nulls | wind_nulls | radiation_nulls | filename_nulls
--------------------------+------------+-----------------------+----------------+----------------+--------------+--------------+------------+-----------------+----------------
                    13848 |      12490 |                440510 |           6173 |           5320 |         1863 |         1863 |       1863 |            4528 |              0
(1 row)

Null / NaN Count Check by Column and Station Name

Same as the above but introducing the station name.

SELECT
  station_name,
  COUNT(*) FILTER (WHERE rain_mm = 'NaN') AS rain_nulls,
  COUNT(*) FILTER (WHERE pan_evaporation_mm = 'NaN') AS pan_evaporation_nulls,
  COUNT(*) FILTER (WHERE max_temp_c = 'NaN') AS max_temp_nulls,
  COUNT(*) FILTER (WHERE min_temp_c = 'NaN') AS min_temp_nulls,
  COUNT(*) FILTER (WHERE max_rh_percent = 'NaN') AS max_rh_nulls,
  COUNT(*) FILTER (WHERE min_rh_percent = 'NaN') AS min_rh_nulls,
  COUNT(*) FILTER (WHERE avg_wind_speed_mps = 'NaN') AS wind_nulls,
  COUNT(*) FILTER (WHERE radiation_mj_m2 = 'NaN') AS radiation_nulls,
  COUNT(*) FILTER (WHERE source_filename = 'NaN') AS filename_nulls
FROM bom_vic_climate
GROUP BY station_name
ORDER BY station_name
;

Output…

            station_name             | rain_nulls | pan_evaporation_nulls | max_temp_nulls | min_temp_nulls | max_rh_nulls | min_rh_nulls | wind_nulls | radiation_nulls | filename_nulls
-------------------------------------+------------+-----------------------+----------------+----------------+--------------+--------------+------------+-----------------+----------------
 aireys_inlet                        |         61 |                  6073 |             32 |             28 |           18 |           18 |         18 |              43 |              0
 albury_airport                      |          0 |                    60 |              0 |              0 |            0 |            0 |          0 |               0 |              0
 alice_springs_airport               |          0 |                    14 |              0 |              0 |            0 |            0 |          0 |               0 |              0
 avalon_airport                      |         47 |                  6075 |             49 |             43 |            2 |            2 |          2 |              43 |              0
 bairnsdale_airport                  |         25 |                  6074 |             27 |             13 |           11 |           11 |         11 |              43 |              0
 ballarat_aerodrome                  |         25 |                  6069 |             33 |             18 |            2 |            2 |          2 |              44 |              0
 bendigo_airport                     |         36 |                  6061 |             34 |             25 |           12 |           12 |         12 |              43 |              0
 breakwater_(geelong_racecourse)     |         68 |                  5189 |            111 |             83 |           14 |           14 |         14 |              15 |              0
 cape_nelson_lighthouse              |         19 |                  6058 |             21 |             12 |            3 |            3 |          3 |              42 |              0
 cape_otway_lighthouse               |         71 |                  6068 |             74 |             60 |           20 |           20 |         20 |              43 |              0
 casterton                           |         29 |                  6058 |             31 |             28 |           18 |           18 |         18 |              42 |              0
 cerberus                            |         32 |                  6059 |             75 |             59 |           15 |           15 |         15 |              42 |              0
 charlton                            |         39 |                  6074 |             24 |             16 |            8 |            8 |          8 |              42 |              0
 colac_(mount_gellibrand)            |        155 |                  6056 |             53 |             50 |           12 |           12 |         12 |              42 |              0
 coldstream                          |         30 |                  6075 |             26 |             19 |           15 |           15 |         15 |              42 |              0
 coldstream_comparison               |        157 |                  4101 |            147 |            139 |           84 |           84 |         84 |              49 |              0
...
...
 wilsons_promontory_lighthouse       |        114 |                  6035 |            104 |             91 |           50 |           50 |         50 |              42 |              0
 yarram_airport                      |         24 |                  6072 |             20 |             15 |           10 |           10 |         10 |              42 |              0
 yarrawonga                          |         99 |                  6063 |            106 |             88 |           44 |           44 |         44 |              42 |              0
(96 rows)

Environment Data Monitoring (BOM)

Purpose

Scope

Audience

Shell Script to Download Data

Database Configuration and Ingest Script

DBTables

Python Ingest Script

Data Quality and Dignostics

Rows missing metrics

Future Dates

Data File Check

Null / NaN Count Check by Column

Null / NaN Count Check by Column and Station Name

Environment Data Insights

Check all data