gene-expression-chipseq

How the graphs for the paper are done

Use cases

Plot genes from gene ontology(ies) one group per one graph (from DGE table)
Plot genes from gene ontology(ies) multiple groups per one graph (from DGE table)
Counting number of genes in chipseq tables
Processing the chipseq data

Data files description

DGE_source-table.xlsx place where I did the filtering to get the significant genes keeps all data in excel
DGE_all.csv Table of differential gene expression both significant and non significant
DGE_significant.csv Table of only significant genes coming from DGE_all.xlsx
DGE_gene-symbols_all.csv just the gene symbols of the the DGE.csv
DGE_gene-symbols_significant.csv just the gene symbols of the the DGE_significant.csv
data_ChIP-seq_day-10_fixed.csv The chipseq table containing all the hits

Data files links

2MB

DGE_all.csv

Open

3MB

DGE_source-table.xlsx

Open

3MB

data_ChIP-seq_day-10_fixed.csv

Open

Script files

1. `DGE_get-unique-gene-symbol.R`

Takes a single DGE table and returns file with a list of unique genes.
Is useful to get the list of all genes detected in RNAseq or the significantly disregulated.
This is a way how to make a filter out of a DGE table which can then be used to filter chipseq data

2. `chipseq_get-unique-gene-symbol.R`

The same as above, but because the original table has Gene Nameinstead of gene_symbol as in DGE there are two scripts for this purpose.
Can be used to create filters to filter the DGE dataset.

3. `DGE_read-and-filter_exact-only.R`

Takes two imputs:
- several filter files (with list of gene names)
- single DGE_* gene expression table as input.
Filters the DGE*gene expression table.
Creates an output folder with a date in the name.
It is case insensitive, but makes exact matches.

4. `DGE_plot-top*`

Takes multiple files which are the outputs from DGE_read-and-filter... and creates graph of top x dge genes.
All files are in an output folder.

5. `append_csvs.R`

Appends csvs created by DGE_read-and-filter_exact-only.R for plotting of multiple groups of genes.
Creates a new column called gene_ontology and puts the name of the csv file from which that line came from

6. `reformatting-chipseq.R`

Reformats the data_Chip-seq_day-10_fixed.csv to get the binding localizations and intron numbers in nicer format

7. `chipseq_read-and-filter_exact-only.R`

Works the same as the DGE_read-and-filter... .R but requires the data_ChIP-seq_day-10_fixed.csv files creates the same kind of output in a folder with date.
Is useful to get the genes which are in the DGE table or in the DGE table of significant genes

8. `count-binding-sites.R`

Takes the reformatted chipseq table (with locations column) and counts the binding sites (all and also in the genes)

9. `chipseq_gene-names_by-binding-locations.R`

Splits the reformatted chipseq table by the different binding locations and outputs the list of genes into csv files

4KB

DGE_read-and-filter_exact-only.R

Open

Specifications

Filter files

Filter file is a file which contains one gene_symbol (for differentail gene expression tables) or Gene Name (for chipseq data table) per line. Are used to filter the specified genes out of the abovementioned tables.

Gene ontologies filters

Gene onotologies filters are get from AmiGo2. The filters are stored in _filters_gene-onotologies_specific-genes folder\

Getting the dataset from amigo2:

Select the gene ontology
Select homo-sapiens
Select the gene_label
Save with a name of ontology and the GO number as a .txt file

Other filter files

These are filters for either specific sets of gene

Custom filter hand-picked

Write the gene_symbols into a text file, one gene per line.

Gene groups with common name

For collagens myosins etc using the grep to filter out the names starting with common sequence of characters out of the differentially expressed genes.

grep -ie '^myl' DGE_gene-symbols_all.csv > filt_myl.csv

Next open the file and remove the genes you are not intersted in.

Pipelines

1. Processing the Differential expression data to plot single gene ontologies

Create the filters.
Run the DGE_read-and-filter_exact-only.csv on selected filters and DGE_all.csv
Check the output_date.. folder for the filtered tables
Run the DGE_plot-top*.R script on the filtered tables.
Check the output graphs in the graphs_date.. folder and copy the useful ones into a _selected_graphs folder

2. Processing the Differential expression data to plot multiple gene ontologies in one graph

Create the filters.
Run the DGE_read-and-filter_exact-only.csv on selected filters and DGE_all.csv
Check the output_date.. folder for the filtered tables
Run the append_csvs.R to create on files from the previous step
Run the DGE_plot-vertical-grouped.R script on the table of appended csvs.
Check the output graphs in the graphs_date.. folder and copy the useful ones into a _selected_graphs folder

3. Counting number of genes in chipseq tables

The chipseq dataset contains multiple entries for single gene. In order to get the list of unique genes in the table or its subset use the chipseq_get-unique-gene-symbols.R to get the list of the genenames where it is easy to count the genes when opening it in excel.

4. Processing the chipseq data

run the reforrmatting-chipseq.R script on the data_ChIP-seq_day-10_fixed.csv to get additional columns location (contains the location of binding promoter, intron, exon, 3UTR etc) and intron_number (numbers of introns from 1to5) in a file called chipseq_with_locations.csv in output_date.. folder. Rename it to chipseq_with_locations_all.csv
Run the chipseq_read-and-filter_exact-only.R with filter DGE_gene-symbols_significant.csv on chipseq_with_locations_all to get only the non-significantly disregulated genes. Rename it to chipseq_with_locations_signif.csv
Run the count-binding-sites.R on the chipseq_with_locations_all.csv to count the YAP1 binding to:
- locations (intron, exon, promoter etc) - count all the binding sites (can bind to more site in one location of one gene)
- introns 1 to 5 -count all the sites (again can bind more times in intron 1 for example)
- locations but count just the genes where it bound
- introns 1 to5 count just the genes where it is bound
Run the chipseq_gene-names_by-binding-locations.R on the chipseq_with_locations_all.csv to get the gene name sets for all the locations and the introns 1 to 5 in detail in output_date... folder.

PreviousImage Magick NextProcessing the ChIP-seq data

Last updated 1 year ago

hashtagUse cases

hashtagData files description

hashtagData files links

hashtagScript files

hashtag1. DGE_get-unique-gene-symbol.R

hashtag2. chipseq_get-unique-gene-symbol.R

hashtag3. DGE_read-and-filter_exact-only.R

hashtag4. DGE_plot-top*

hashtag5. append_csvs.R

hashtag6. reformatting-chipseq.R

hashtag7. chipseq_read-and-filter_exact-only.R

hashtag8. count-binding-sites.R

hashtag9. chipseq_gene-names_by-binding-locations.R

hashtagSpecifications

hashtagFilter files

hashtagGene ontologies filters

hashtagOther filter files

hashtagCustom filter hand-picked

hashtagGene groups with common name

hashtagPipelines

hashtag1. Processing the Differential expression data to plot single gene ontologies

hashtag2. Processing the Differential expression data to plot multiple gene ontologies in one graph

hashtag3. Counting number of genes in chipseq tables

hashtag4. Processing the chipseq data

Use cases

Data files description

Data files links

Script files

1. `DGE_get-unique-gene-symbol.R`

2. `chipseq_get-unique-gene-symbol.R`

3. `DGE_read-and-filter_exact-only.R`

4. `DGE_plot-top*`

5. `append_csvs.R`

6. `reformatting-chipseq.R`

7. `chipseq_read-and-filter_exact-only.R`

8. `count-binding-sites.R`

9. `chipseq_gene-names_by-binding-locations.R`

Specifications

Filter files

Gene ontologies filters

Other filter files

Custom filter hand-picked

Gene groups with common name

Pipelines

1. Processing the Differential expression data to plot single gene ontologies

2. Processing the Differential expression data to plot multiple gene ontologies in one graph

3. Counting number of genes in chipseq tables

4. Processing the chipseq data