Back to Code Snippets
Remove Duplicate Records from a CSV File (Bash)Bash
This function helps clean up a dataset by identifying and removing duplicate records. It’s especially useful for ensuring data integrity before analysis.
Execute this Bash
#!/bin/bash function remove_duplicates() { input_file="$1" # Input CSV file with duplicates output_file="$2" # Deduplicated output CSV file # Use DuckDB to remove duplicate rows and write the cleaned data to a new CSV file. duckdb -c "COPY (SELECT DISTINCT * FROM read_csv_auto('$input_file')) TO '$output_file' (FORMAT CSV, HEADER TRUE);" } #Usage remove_duplicates "input_data.csv" "cleaned_data.csv"
Copy code