Working with large files (bash)

to count number of rows in file:


wc -l largefile.csv

to see the top 5 rows in file:


head -5 largefile.csv

to filter rows by certain column in file (e.g. column 5 > 20 and != 255) and output another file

awk '$5 >= 20 && $5 != 255{print $0}' largefile.csv > output.csv

to filter rows by certain column (numerical only - e.g. col 3 == 99) in file and output to another file, with header (pipe-delimited)

head -1 largefile.csv > output.csv |
awk -F "|" '$3 == 99 {print $0}' largefile.csv >> output.csv

to filter rows by certain column (string - e.g. col 1 == ‘dec’) in file and output to another file, with header (pipe-delimited)

head -1 largefile.csv > output.csv |
awk -F "|" 'match($1,/dec/) {print $0}' largefile.csv >> output.csv

to filter rows containing string and output to another file, with header

head -1 largefile.csv > output.csv |
awk '/dec/' largefile.csv >> output.csv

modified from https://stackoverflow.com/questions/29503699/filtering-a-csv-file-with-awk

also, check out https://en.wikibooks.org/wiki/An_Awk_Primer/Awk_Command-Line_Examples for more details.

https://www.theurbanpenguin.com/filtering-with-awk/ is also very helpful!!