-
Blasted current S.Div assembly (given at this link) against Arabidopsis FLOE 1, 2, and 3's "Full length CDS" and "protein" from TAIR.
-
Put blast results onto this spreadsheet and extracted the best matching sequences for each FLOE with a buffer of 500 nts on each side. Three files were made, one for each of the FLOE and it's best matching sequences:
- for FLOE 1: ptg000004l and ptg000005l
- for FLOE 2: ptg000009l and ptg000010l
- for FLOE 3: ptg000001l and ptg000009l and ptg000010l and ptg000013l
-
Had to transform ptg000009l into reverse complement.
-
Used UGENE to aligned each file with mafft. (Also looked with other aligners, but did nothing with those.)
-
Cleaned up the files by trimming out everything except for the exons.
-
Used UGENE to combine and align all of the sequences (arabidopsis and my trimmed exons).
-
Used that file to make a tree. There were duplicates for ptg000009l and ptg000010l, we only kept the ones that matched to FLOE 2.
-
Blasted my trimmed FLOE files against original S.Div assembly to get coordinates. Results on this spreadsheet.
-
Blasted my trimmed FLOE files against predicted S.div transcript. Results on this spreadsheet.
-
There was missing information on ptg000013l in the gff file. Added gene, exon, and CDS to the file.
-
Now working on Differential Gene Expression from RNAseq (Lab on May 23, 2023. See scripts folder.