Comments (10)
I also just fixed the L-probe example file in a way that should now work consistently, and changed again the pandas requirements (turns out I had changed that in a test branch and not merged the changes to the main branch).
Please keep finding bugs!
Marco
from lee_2023.
I haven't changed the Biopython requirements, as you really get a warning rather than an error. So I'll try to fix it with a bit more time in a later release :)
from lee_2023.
Thanks Ines,
we'll try to fix the biopython thing in the next update.
Can you please post your matplotlib error?
Thx!
Marco
from lee_2023.
Hi Marco,
the matplotlib error was
ModuleNotFoundError: No module named 'matplotlib'
I just installed matplotlib and it worked fine.
I now managed to run through the whole notebook (I can post these issues separately if preferred/or change the title of this issue? but here summed up for now):
I had the same pandas error as here. I freshly pulled from git yesterday. First I tried to change .append
to pd.concat
in probedesign.py which did not work obviously. I forced pandas==1.1.5 which fixed it as suggested in the linked issue.
Since I still had to input the cutadapt file path, it took me a while to realise that it was loading the probedesign.py from the .egg module and changing the probedesign.py file would not do much. I had not worked with modules/.egg files before so it was quicker for me to do the following:
swap the following line
from PLP_directRNA_design import probedesign as plp
for
import sys
# Add the folder path to sys.path
sys.path.insert(0, '/user/pathto/PLP_directRNA_design')
import probedesign as plp
Not sure if there is an easier way (or if this might have broken things?).
Another issue I had was in the "Assign Genes to Barcode" section as I tested how="start" on=LbarID
. Since it wasn`t entirely clear to me whether to use the LbarID or a different variable/value, I tried all variations of "LbarID"/LbarID, numbers, column IDs, even barcode ID etc (maybe a full example could help?), until I figured out the issue was in the Lprobe_Ver2.csv. For clarity, this is the error I received:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/miniconda3/envs/probedesign/lib/python3.9/site-packages/pandas/core/indexes/base.py:3361, in Index.get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
File ~/miniconda3/envs/probedesign/lib/python3.9/site-packages/pandas/_libs/index.pyx:76, in pandas._libs.index.IndexEngine.get_loc()
File ~/miniconda3/envs/probedesign/lib/python3.9/site-packages/pandas/_libs/index.pyx:108, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'number'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[21], line 2
1 customizedlib=r"C:\Users\sergio.salas\Documents\PhD\projects\gene_design\hower_example_5\assigned_gene_LID.csv"
----> 2 probes=plp.build_plps(path,specific_seqs_final,L_probe_library,plp_length,how='start',on="Lbar_ID")
File ~/02-spatial_transcriptomics/01-dataanalysis/08-ISS_processing/PLP_directRNA_design/PLP_directRNA_design/probedesign.py:510, in build_plps(path, specific_seqs_final, L_probe_library, plp_length, how, on)
508 n=0
509 for g in gname:
--> 510 gene_names_ID = gene_names_ID.append({"gene": g, "idseq" : np.array(sbh.loc[sbh['number']==ID+n,'ID_Seq'])[0], "Lbar_ID" : str(np.array(sbh.loc[sbh['number']==ID+n,'Lbar_ID'])[0]), "AffyID" : np.array(sbh.loc[sbh['number']==ID+n,'L_Affy_ID'])[0] }, ignore_index=True)
511 n=n+1
512 gene_names_ID2=gene_names_ID.set_index("gene", drop = False)
File ~/miniconda3/envs/probedesign/lib/python3.9/site-packages/pandas/core/frame.py:3458, in DataFrame.__getitem__(self, key)
3456 if self.columns.nlevels > 1:
3457 return self._getitem_multilevel(key)
-> 3458 indexer = self.columns.get_loc(key)
3459 if is_integer(indexer):
3460 indexer = [indexer]
File ~/miniconda3/envs/probedesign/lib/python3.9/site-packages/pandas/core/indexes/base.py:3363, in Index.get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3365 if is_scalar(key) and isna(key) and not self.hasnans:
3366 raise KeyError(key)
KeyError: 'number'
I changed the values of the Lbar_ID rows "LbarID_0" in the .csv to a number (201>) and changed sbh['number']
to sbh['Lbar_ID']
such as below:
for g in gname:
gene_names_ID = gene_names_ID.append({"gene": g, "idseq" : np.array(sbh.loc[sbh['Lbar_ID']==ID+n,'ID_Seq'])[0], "Lbar_ID" : str(np.array(sbh.loc[sbh['Lbar_ID']==ID+n,'Lbar_ID'])[0]), "AffyID" : np.array(sbh.loc[sbh['Lbar_ID']==ID+n,'L_Affy_ID'])[0] }, ignore_index=True)
Now it runs.
Additionally, I was wondering if you ever had issues with duplicate sequences? I tried this pipeline with a random gene (Zfp85) and got 6 PLP sequences, 2 out of those are duplicates.
Here the results attached:
good_targetsfinal.csv
designed_PLPs_final.csv
Thank you!
Ines
from lee_2023.
Hi Ines,
interesting, the pandas thing should have been fixed by a previous update. I'll double check.
Same goes for the redundant probes, we had solved this in a previous version of our code, but somehow made it here.
Give me a few days to go through the code.
M.
from lee_2023.
Ok here I am.
This is for both @Boehmin and @Sverreg (commenting on the issue mentioned here).
You get duplicate sequence (or sequences within a +-20 nt range, which overlap and should be excluded, you can check this in the position column) likely because your gene is a bad substrate for the probes. Either it's too short or doesn't comply very well with our GC requirement.
When the search doesn't find a number of target equal or superior to the one you specified (default=5) it will automatically return all the targets.
Here's the relevant code snipped from the select_sequences
functions in PLP_design.py
if ele.shape[0]<number_of_selected:
selec=ele
else:
for num in range(0,number_of_selected):
if ele.shape[0]>0:
randomlist = random.sample(range(0, ele.shape[0]), 1)
sele=ele.iloc[randomlist,:]
try:
seleall=pd.concat([seleall,sele])
except:
seleall=sele
exclude=list(range(int(sele['Position']-20),int(sele['Position']+20)))
ele=ele.loc[~ele['Position'].isin(exclude),:]
selec=seleall
selected2=pd.concat([selected2,selec])
I'd suggest to run the search again for these genes relaxing a bit the GC content or taking out the requirement for a terminal G/C. Maybe that will fix the issue. Keep in mind that sometimes it's impossible to design "good"probes against some genes. You can try your chances anyway, design them manually and they might work...
Please let me know if my explanation doesn't make sense.
Cheers,
Marco
from lee_2023.
Hi Marco,
thank you for the explanation! I`ll keep that in mind and will give this a go. Just to check, would lowering the target requirement to =3 or 4 potentially also help?
Cheers,
Ines
from lee_2023.
Hi Ines,
regarding your last question. I think it's wise to have 5 probes (targets) per gene if possible. This will ensure high detection efficiency. While setting the target requirement to 3 or 4 will help solving the issue above, I'd still try to design 5 and do some manual check to remove duplicates and overlaps.
Cheers and sorry for the late reply!
Marco
from lee_2023.
Hi Marco,
thanks for the tip. I`ll test this again.
Maybe you can answer another question. I am trying to design probes that are as species-agnostic as possible between mouse and human (following the description in the pre-print).
Do I understand correctly that I should:
- run the extract and align sequences for mouse & human
- Concatenate results?
- run plp.select_sequences() on mouse + human extracted 30kmers (run this on mouse & human separately or combined?)
- run plp.map_sequences() on selected sequences above, keep only 30mers found in both species, run this twice; once against human, once against mouse
- Continue normally to end like in the tutorial notebook
I`m also slowly going through the ISS analysis notebooks, so I will start a separate issue should I run into bugs there. :)
from lee_2023.
Hi @mgcizzu ,
one more question, is the anchor sequence in the notebook the primer sequence? Since you use a pseudo-anchor as described in the supplementary, I assume the anchor in this notebook is the complementary sequence of your RCA primer?
Thank you!
from lee_2023.
Related Issues (6)
- PLP_design.ipynb error with pandas >=2.0.0 HOT 1
- Assigning barcodes with plp.build_plps, KeyError 'number' HOT 2
- Issue running ISS_postprocessing notebook + cell boundary stain question HOT 3
- new Installation in my miniconda then test this file but show ERROR ,how to solve this problem. HOT 1
- ISS_preprocessing index error with example Leica-data HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lee_2023.