csvkit
command line tool to manipulate excel and csv files
link to tutorial page and documentation really awesome, read this first
Installation for gitbash
This run without problems but installed in a different path
So I just copied the files into /usr/bin/
The scripts got installed here:
C:\Users\vld\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\Scripts
To get to the hidden AppData folder pres WindowButton + R and then type %AppData%
Adjust csv data and insert into a text document
Insert a table from "stacked.csv" into document in vim:
.r ! csvlook stacked.csv
Insert only selected columns of table
To insert only "group" and "peakNo" columns.
.r ! csvcut -c group,peakNo | csvlook
Insert table sorted by one or multiple columns
To insert a table sorted by two columns (max and peakNo):
.r ! csvsort -c max,peakNo stacked.csv | csvlook
-r flag: revert the sorting to descending
Getting example data
in2csv the excel killer
transform to csv and show in terminal
in2csv ne_1033_data.xlsx
transform to csv and save as data.csv
in2csv ne_1033_data.xlsx > data.csv
csvlook: dataperiscope
to get the rough idea of the data use csvlook
csvlook data.csv
to get it paged, separated in columns and able to navigate through, use less
csvlook data.csv | less -S
csvcut: datascalpel
to select and reorder columns
to display the numbers and names of the columns use -n flag*
csvcut -n data.csv
Insert table filtered by word in given column
To select only rows containing "data2" use -m flag:
.r ! csvgrep -c group -m data2 stacked.csv | csvlook
-i flag: to revert from select matching to return not matching
Insert table filtered by regular expression in given column
To filter rows which in group column contain "ta2" or "ta3" use -r flag:
.r ! csvgrep -c group -r "ta2|ta3" stacked.csv | csvlook
-i flag: to revert from select matching to return not matching
Filter table rows by matching list contained in a file
Cell in csv is matched exactly by line
To filter by file "filter-file" which contains 2 lines: data3-test.csv and data1-test.csv.\
Print the lines containing what is in the file\
.r ! csvgrep -c group -f filter-file stacked.csv | csvlook
print lines not containing what is in the file
.r ! csvgrep -c group -if filter-file stacked.csv | csvlook
Cell in csv contains string found in file
To filter by file which contains two lines: data3 and data1
.r ! cat stacked.csv | grep -Fvwf filter-file | csvlook
grep flags:
-f: take from file
-F: as fixed strings
-W: only lines with whole words matches
-v: select non matching lines (filtering)\
Stack first lines of csvs together - csvstack approach
stack the csvs --filename flag creates a column with filenames
csvstack --filenames data*t.csv > stacked.csv
select the (number of peaks) you want to keep
csvgrep -c peakNo -r "^[1-2]$" stacked.csv > filtered.csv
Or this is an alternative
csvgrep -c peakNo -r "^1|2$" stacked.csv > filtered.csv
Look at the data
csvlook filtered.csv
Do a summary statistics
csvstat filtered.csv
Getting tables into markdown files
Installation for gitbash
pip --version check if the pip is installed
pip install csvkit try to install
in case the output contains "Warning ... is not in the path", add the folder to the path following way:
PATH_BACKUP=$PATH create a copy of path just in case
the path is formatted in windows style C:\something\something and needs to be reformatted into linux style
copy the path of the folder indicated in the folder in text file
first replace the '\' with '/'
second change 'C:' into '/c/'
in the terminal add the changed string into the PATH using PATH=$PATH:<your string>
echo $PATH to see if it is there
run csvkit --version to see if it is installed