tyronechen / genomenlp Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://genomenlp.readthedocs.io/en/latest/
License: MIT License
Home Page: https://genomenlp.readthedocs.io/en/latest/
License: MIT License
Describe the bug
When running train
, if the dataset object has previously encoded class labels, it will break.
This is because it tries to coerce this to a class and fails.
To Reproduce
Run train
with any data
Expected behavior
Training to be conducted as normal
Suggested fix
Added a condition to skip type change if it is correct
Screenshots
NA
Issue Description:
While attempting to reproduce the findings in your preprint, I'm encountering an error running the tokenise_bio.py script on large (3GB or larger) FASTA files, including the GCF_000001405.39_GRCh38.p13. It fails with a panic in the Rust code. The code runs successfully on (much) smaller FASTA files (200KB). I have tried using the original .fna file extension as well as .fasta but have not had success in getting past this particular error.
The error log indicates a panic due to a Rust-related error, specifically in the tokenizers-lib library. The panic occurred at the line tokenizers-lib/src/models/unigram rainer.rs:212:53, with the message "called Result::unwrap() on an Err value: Internal." The error thread panicked at called Result::unwrap() on an Err value: Internal, tokenizers-lib/src/models/unigram/trainer.rs:212:53 is related to the Hugging Face Tokenizers library.
To Reproduce
OS: EC2 Instance
Conda venv with python 3.9
Command Used for Error:
RUST_BACKTRACE=full tokenise_bio -i /data/ncbi_dataset/GCF_000001405.39_GRCh38.p13_genomic.fasta -t '/data/generated/ncbi_tokenisers/tokeniser_39_GRCh38.json
Output & error message: https://gist.github.com/stepwise-ai-dev/f23a79faaedd006bf51d486259440dd5
Steps to Reproduce:
Run the tokenise_bio.py script with GCF_000001405.39_GRCh38.p13_genomic.fna as input
Expected output
I expected similar output to previous successfully execution of tokenise_bio.py on a smaller (200KB) FASTA file.
Successful output :https://gist.github.com/stepwise-ai-dev/bbd782f3d09afaca6219b2c3a176bb42
Questions:
Any guidance on resolving this issue would be greatly appreciated.
Thank you so much!
Describe the bug
Many users are confused about what to put for --label_names
To Reproduce
NA
Expected behavior
NA
Suggested fix
Add additional help text and possibly argument name change
Screenshots
NA
Describe the bug
When trying to install the dependencies it fails with pytorch 1.10 not available.
To Reproduce
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Expected behavior
In installs with no issue
Suggested fix
pytorch was renamed torch. Replace pytorch with torch.
Describe the bug
Tokenised transformers dataset object csv files are truncated if the sequence is too long.
To Reproduce
Please provide a minimal reproducible example with all steps to reproduce the behaviour before submitting an issue:
Fields input_tokens
, token_type_ids
, attention_mask
are truncated if the feature
is too long. This is true for output csv
file only.
# sample run on arbitrary file with very long item
create_dataset_bio <infile_path_1> <infile_path_2> <tokeniser>
# sample output csv file
some_seq,<very very long sequence>,1,"[10 ... 20]","[0 ... 0]","[1 ... 1]"
Please make sure to include environment info including python and dependency versions. You can access this with pip freeze
or conda list
as needed.
# this was installed with conda install -c tyronechen ziran
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
_py-xgboost-mutex 2.0 cpu_0 conda-forge
abseil-cpp 20210324.2 h9c3ff4c_0 conda-forge
aiohttp 3.8.4 py39h72bdee0_0 conda-forge
aiohttp-cors 0.7.0 py_0 conda-forge
aioredis 1.3.1 py_0 conda-forge
aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge
alsa-lib 1.2.8 h166bdaf_0 conda-forge
arrow-cpp 8.0.0 py39heccc63a_1_cpu conda-forge
async-timeout 4.0.2 pyhd8ed1ab_0 conda-forge
attr 2.5.1 h166bdaf_1 conda-forge
attrs 22.2.0 pyh71513ae_0 conda-forge
aws-c-cal 0.5.11 h95a6274_0 conda-forge
aws-c-common 0.6.2 h7f98852_0 conda-forge
aws-c-event-stream 0.2.7 h3541f99_13 conda-forge
aws-c-io 0.10.5 hfb6a706_0 conda-forge
aws-checksums 0.1.11 ha31a3da_7 conda-forge
aws-sdk-cpp 1.8.186 hecaee15_4 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 pyhd8ed1ab_3 conda-forge
backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge
blessed 1.19.1 pyhe4f9e05_2 conda-forge
brotli 1.0.9 h166bdaf_8 conda-forge
brotli-bin 1.0.9 h166bdaf_8 conda-forge
brotlipy 0.7.0 py39hb9d737c_1005 conda-forge
bz2file 0.98 py_0 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.18.1 h7f98852_0 conda-forge
ca-certificates 2022.12.7 ha878542_0 conda-forge
cachetools 5.3.0 pyhd8ed1ab_0 conda-forge
cairo 1.16.0 ha61ee94_1014 conda-forge
captum 0.6.0 pyhd8ed1ab_0 conda-forge
certifi 2022.12.7 pyhd8ed1ab_0 conda-forge
cffi 1.15.1 py39he91dace_3 conda-forge
charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge
click 8.0.4 py39hf3d152e_0 conda-forge
cloudpickle 2.2.1 pyhd8ed1ab_0 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
colorful 0.5.4 pyhd8ed1ab_0 conda-forge
cryptography 39.0.0 py39hd598818_0 conda-forge
cudatoolkit 11.8.0 h37601d7_11 conda-forge
cudnn 8.4.1.50 hed8a83a_0 conda-forge
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
dataclasses 0.8 pyhc8e2a94_3 conda-forge
datasets 2.10.1 pyhd8ed1ab_0 conda-forge
dbus 1.13.6 h5008d03_3 conda-forge
decorator 5.1.1 pyhd8ed1ab_0 conda-forge
dill 0.3.6 pyhd8ed1ab_1 conda-forge
distlib 0.3.6 pyhd8ed1ab_0 conda-forge
docker-pycreds 0.4.0 py_0 conda-forge
expat 2.5.0 h27087fc_0 conda-forge
fftw 3.3.10 nompi_hf0379b8_106 conda-forge
filelock 3.10.0 pyhd8ed1ab_0 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 hab24e00_0 conda-forge
fontconfig 2.14.2 h14ed4e7_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.39.2 py39h72bdee0_0 conda-forge
freetype 2.12.1 hca18f0e_1 conda-forge
frozenlist 1.3.3 py39hb9d737c_0 conda-forge
fsspec 2023.3.0 pyhd8ed1ab_1 conda-forge
future 0.18.3 pyhd8ed1ab_0 conda-forge
gensim 4.2.0 py39h1832856_0 conda-forge
gettext 0.21.1 h27087fc_0 conda-forge
gflags 2.2.2 he1b5a44_1004 conda-forge
gitdb 4.0.10 pyhd8ed1ab_0 conda-forge
gitpython 3.1.31 pyhd8ed1ab_0 conda-forge
glib 2.74.1 h6239696_1 conda-forge
glib-tools 2.74.1 h6239696_1 conda-forge
glog 0.6.0 h6f12383_0 conda-forge
google-api-core 2.10.0 pyhd8ed1ab_0 conda-forge
google-auth 2.16.2 pyh1a96a4e_0 conda-forge
googleapis-common-protos 1.57.0 py39hf3d152e_0 conda-forge
gpustat 1.0.0 pyhd8ed1ab_0 conda-forge
graphite2 1.3.13 h58526e2_1001 conda-forge
grpc-cpp 1.43.2 h9e046d8_3 conda-forge
grpcio 1.43.0 py39hff7568b_0 conda-forge
gst-plugins-base 1.21.3 h4243ec0_1 conda-forge
gstreamer 1.21.3 h25f0c4b_1 conda-forge
gstreamer-orc 0.4.33 h166bdaf_0 conda-forge
harfbuzz 6.0.0 h8e241bc_0 conda-forge
hiredis 2.0.0 py39hb9d737c_3 conda-forge
huggingface_hub 0.13.2 pyhd8ed1ab_0 conda-forge
hyperopt 0.2.7 pyhd8ed1ab_0 conda-forge
icu 70.1 h27087fc_0 conda-forge
idna 3.4 pyhd8ed1ab_0 conda-forge
importlib-metadata 6.0.0 pyha770c72_0 conda-forge
importlib_metadata 6.0.0 hd8ed1ab_0 conda-forge
importlib_resources 5.12.0 pyhd8ed1ab_0 conda-forge
ipython 7.33.0 py39hf3d152e_0 conda-forge
jack 1.9.22 h11f4161_0 conda-forge
jedi 0.18.2 pyhd8ed1ab_0 conda-forge
joblib 1.2.0 pyhd8ed1ab_0 conda-forge
jpeg 9e h0b41bf4_3 conda-forge
jsonschema 4.17.3 pyhd8ed1ab_0 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
kiwisolver 1.4.4 py39hf939315_1 conda-forge
krb5 1.20.1 hf9c8cef_0 conda-forge
lame 3.100 h166bdaf_1003 conda-forge
lcms2 2.15 hfd0df8a_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
lerc 4.0.0 h27087fc_0 conda-forge
libblas 3.9.0 12_linux64_mkl conda-forge
libbrotlicommon 1.0.9 h166bdaf_8 conda-forge
libbrotlidec 1.0.9 h166bdaf_8 conda-forge
libbrotlienc 1.0.9 h166bdaf_8 conda-forge
libcap 2.66 ha37c62d_0 conda-forge
libcblas 3.9.0 12_linux64_mkl conda-forge
libclang 15.0.7 default_had23c3d_1 conda-forge
libclang13 15.0.7 default_h3e3d535_1 conda-forge
libcrc32c 1.1.2 h9c3ff4c_0 conda-forge
libcups 2.3.3 h36d4200_3 conda-forge
libcurl 7.87.0 h6312ad2_0 conda-forge
libdb 6.2.32 h9c3ff4c_0 conda-forge
libdeflate 1.17 h0b41bf4_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libevent 2.1.10 h9b69904_4 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libflac 1.4.2 h27087fc_0 conda-forge
libgcc-ng 12.2.0 h65d4601_19 conda-forge
libgcrypt 1.10.1 h166bdaf_0 conda-forge
libgfortran-ng 12.2.0 h69a702a_19 conda-forge
libgfortran5 12.2.0 h337968e_19 conda-forge
libglib 2.74.1 h606061b_1 conda-forge
libgoogle-cloud 1.36.0 h6945097_0 conda-forge
libgpg-error 1.46 h620e276_0 conda-forge
libhwloc 2.9.0 hd6dc26d_0 conda-forge
libiconv 1.17 h166bdaf_0 conda-forge
liblapack 3.9.0 12_linux64_mkl conda-forge
libllvm15 15.0.7 hadd5161_1 conda-forge
libnghttp2 1.51.0 hdcd2b5c_0 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libogg 1.3.4 h7f98852_1 conda-forge
libopus 1.3.1 h7f98852_1 conda-forge
libpng 1.6.39 h753d276_0 conda-forge
libpq 15.1 h2baec63_3 conda-forge
libprotobuf 3.19.4 h780b84a_0 conda-forge
libsndfile 1.2.0 hb75c966_0 conda-forge
libsqlite 3.40.0 h753d276_0 conda-forge
libssh2 1.10.0 haa6b8db_3 conda-forge
libstdcxx-ng 12.2.0 h46fd767_19 conda-forge
libsystemd0 252 h2a991cd_0 conda-forge
libthrift 0.16.0 h491838f_2 conda-forge
libtiff 4.5.0 h6adf6a1_2 conda-forge
libtool 2.4.7 h27087fc_0 conda-forge
libudev1 253 h0b41bf4_0 conda-forge
libunwind 1.6.2 h9c3ff4c_0 conda-forge
libutf8proc 2.8.0 h166bdaf_0 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libvorbis 1.3.7 h9c3ff4c_0 conda-forge
libwebp-base 1.3.0 h0b41bf4_0 conda-forge
libxcb 1.13 h7f98852_1004 conda-forge
libxgboost 1.7.1 cpu_ha3b9936_0 conda-forge
libxkbcommon 1.5.0 h79f4944_1 conda-forge
libxml2 2.10.3 hca2bb57_3 conda-forge
libzlib 1.2.13 h166bdaf_4 conda-forge
llvm-openmp 15.0.7 h0cdce71_0 conda-forge
lz4-c 1.9.3 h9c3ff4c_1 conda-forge
magma 2.5.4 hc72dce7_4 conda-forge
matplotlib 3.5.2 py39hf3d152e_1 conda-forge
matplotlib-base 3.5.2 py39h700656a_1 conda-forge
matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge
mkl 2021.4.0 h8d4b97c_729 conda-forge
mpg123 1.31.2 hcb278e6_0 conda-forge
mpmath 1.3.0 pyhd8ed1ab_0 conda-forge
msgpack-python 1.0.5 py39h4b4f3f3_0 conda-forge
multidict 6.0.4 py39h72bdee0_0 conda-forge
multiprocess 0.70.14 py39hb9d737c_3 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
mysql-common 8.0.32 h14678bc_0 conda-forge
mysql-libs 8.0.32 h54cf53e_0 conda-forge
nccl 2.14.3.1 h0800d71_0 conda-forge
ncurses 6.3 h27087fc_1 conda-forge
networkx 3.0 pyhd8ed1ab_0 conda-forge
ninja 1.11.1 h924138e_0 conda-forge
nipals 0.5.5 pypi_0 pypi
nspr 4.35 h27087fc_0 conda-forge
nss 3.89 he45b914_0 conda-forge
numpy 1.24.2 py39h7360e5f_0 conda-forge
nvidia-ml-py 11.495.46 pyhd8ed1ab_0 conda-forge
opencensus 0.11.2 pyhd8ed1ab_0 conda-forge
opencensus-context 0.1.3 py39hf3d152e_1 conda-forge
openjpeg 2.5.0 hfec8fc6_2 conda-forge
openssl 1.1.1t h0b41bf4_0 conda-forge
orc 1.7.3 h1be678f_0 conda-forge
packaging 23.0 pyhd8ed1ab_0 conda-forge
pandas 1.4.2 py39h1832856_2 conda-forge
parquet-cpp 1.5.1 2 conda-forge
parso 0.8.3 pyhd8ed1ab_0 conda-forge
pathtools 0.1.2 py_1 conda-forge
patsy 0.5.3 pyhd8ed1ab_0 conda-forge
pcre2 10.40 hc3806b6_0 conda-forge
pexpect 4.8.0 pyh1a96a4e_2 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 9.4.0 py39h2320bf1_1 conda-forge
pip 23.0.1 pyhd8ed1ab_0 conda-forge
pixman 0.40.0 h36c2ea0_0 conda-forge
pkgutil-resolve-name 1.3.10 pyhd8ed1ab_0 conda-forge
platformdirs 3.1.1 pyhd8ed1ab_0 conda-forge
ply 3.11 py_1 conda-forge
pooch 1.7.0 pyhd8ed1ab_0 conda-forge
powerlaw 1.4.6 pyh9f0ad1d_1 conda-forge
prometheus_client 0.13.1 pyhd8ed1ab_0 conda-forge
promise 2.3 py39hf3d152e_7 conda-forge
prompt-toolkit 3.0.38 pyha770c72_0 conda-forge
protobuf 3.19.4 py39he80948d_0 conda-forge
psutil 5.9.4 py39hb9d737c_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pulseaudio 16.1 h4ab2085_1 conda-forge
py-spy 0.3.14 h87a5ac0_0 conda-forge
py-xgboost 1.7.1 cpu_py39h4655687_0 conda-forge
py4j 0.10.9.7 pyhd8ed1ab_0 conda-forge
pyarrow 8.0.0 py39h42d110c_1_cpu conda-forge
pyasn1 0.4.8 py_0 conda-forge
pyasn1-modules 0.2.7 py_0 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pygments 2.14.0 pyhd8ed1ab_0 conda-forge
pyopenssl 23.0.0 pyhd8ed1ab_0 conda-forge
pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge
pyqt 5.15.7 py39h5c7b992_3 conda-forge
pyqt5-sip 12.11.0 py39h227be39_3 conda-forge
pyrsistent 0.19.3 py39h72bdee0_0 conda-forge
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.9.15 h47a2c10_0_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-xxhash 3.2.0 py39h72bdee0_0 conda-forge
python_abi 3.9 3_cp39 conda-forge
pytorch 1.10.0 cuda112py39h3ad47f5_1 conda-forge
pytz 2022.7.1 pyhd8ed1ab_0 conda-forge
pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge
pyyaml 6.0 py39hb9d737c_5 conda-forge
qt-main 5.15.6 h18908ee_6 conda-forge
ray-core 1.13.0 py39hecbb631_2 conda-forge
ray-default 1.13.0 py39hf3d152e_2 conda-forge
re2 2022.02.01 h9c3ff4c_0 conda-forge
readline 8.1.2 h0f457ee_0 conda-forge
regex 2022.10.31 py39hb9d737c_0 conda-forge
requests 2.28.2 pyhd8ed1ab_0 conda-forge
responses 0.18.0 pyhd8ed1ab_0 conda-forge
rsa 4.9 pyhd8ed1ab_0 conda-forge
s2n 1.0.10 h9b69904_0 conda-forge
sacremoses 0.0.53 pyhd8ed1ab_0 conda-forge
scikit-learn 1.1.1 py39h4037b75_0 conda-forge
scipy 1.10.1 py39h7360e5f_0 conda-forge
screed 1.0.5 pyhd8ed1ab_1 conda-forge
seaborn 0.11.2 hd8ed1ab_0 conda-forge
seaborn-base 0.11.2 pyhd8ed1ab_0 conda-forge
sentencepiece 0.1.96 py39hf939315_1 conda-forge
sentry-sdk 1.17.0 pyhd8ed1ab_0 conda-forge
setproctitle 1.2.2 py39hb9d737c_2 conda-forge
setuptools 67.6.0 pyhd8ed1ab_0 conda-forge
shortuuid 1.0.11 pyhd8ed1ab_0 conda-forge
sip 6.7.7 py39h227be39_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sleef 3.5.1 h9b69904_2 conda-forge
smart_open 6.3.0 pyhd8ed1ab_1 conda-forge
smmap 3.0.5 pyh44b312d_0 conda-forge
snappy 1.1.10 h9fff704_0 conda-forge
statsmodels 0.13.5 py39h2ae25f5_2 conda-forge
tabulate 0.9.0 pyhd8ed1ab_1 conda-forge
tbb 2021.8.0 hf52228f_0 conda-forge
threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
tokenizers 0.12.1 py39h3045328_1 conda-forge
toml 0.10.2 pyhd8ed1ab_0 conda-forge
tornado 6.2 py39hb9d737c_1 conda-forge
tqdm 4.64.0 pyhd8ed1ab_0 conda-forge
traitlets 5.9.0 pyhd8ed1ab_0 conda-forge
transformers 4.23.1 pyhd8ed1ab_0 conda-forge
transformers-interpret 0.8.1 pyhd8ed1ab_0 conda-forge
typing-extensions 4.5.0 hd8ed1ab_0 conda-forge
typing_extensions 4.5.0 pyha770c72_0 conda-forge
tzdata 2022g h191b570_0 conda-forge
unicodedata2 15.0.0 py39hb9d737c_0 conda-forge
urllib3 1.26.15 pyhd8ed1ab_0 conda-forge
virtualenv 20.21.0 pyhd8ed1ab_0 conda-forge
wandb 0.13.4 pyhd8ed1ab_0 conda-forge
wcwidth 0.2.6 pyhd8ed1ab_0 conda-forge
weightwatcher 0.6.4 py_0 tyronechen
wheel 0.40.0 pyhd8ed1ab_0 conda-forge
xcb-util 0.4.0 h516909a_0 conda-forge
xcb-util-image 0.4.0 h166bdaf_0 conda-forge
xcb-util-keysyms 0.4.0 h516909a_0 conda-forge
xcb-util-renderutil 0.3.9 h166bdaf_0 conda-forge
xcb-util-wm 0.4.1 h516909a_0 conda-forge
xgboost 1.7.1 cpu_py39h4655687_0 conda-forge
xkeyboard-config 2.38 h0b41bf4_0 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.0.10 h7f98852_0 conda-forge
xorg-libsm 1.2.3 hd9c2040_1000 conda-forge
xorg-libx11 1.8.4 h0b41bf4_0 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h0b41bf4_2 conda-forge
xorg-libxrender 0.9.10 h7f98852_1003 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xxhash 0.8.1 h0b41bf4_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
yarl 1.8.2 py39hb9d737c_0 conda-forge
yellowbrick 1.3.post1 pyhd8ed1ab_1 conda-forge
zipp 3.15.0 pyhd8ed1ab_0 conda-forge
ziran 1.0.9 0 tyronechen
zlib 1.2.13 h166bdaf_4 conda-forge
zstd 1.5.2 h3eb15da_6 conda-forge
Expected behavior
A clear and concise description of what you expected to happen.
csv
files should not have truncated array values.
Suggested fix
If known.
Temporary fix: Use parquet
and json
files as input for training since these are unaffected.
Long term fix: Increase the array size limit for printing on pandas
and/or numpy
.
Screenshots
If applicable, add screenshots to help explain your problem.
Not applicable
Describe the bug:
When I attempt to run the create_embedding_bio_sp.py
script, it fails with a TypeError: 'float' object is not subscriptable
. The error seems to be due to the script's inability to handle 'float' values in the data.
To Reproduce:
Here are the exact commands that I ran:
tokenise_bio.py
script:source activate myenv && python "tokenise_bio.py" -i "<your_path>/storage/FASTA_small/smallfasta1.fasta" -t "<your_path>/storage/tokenizer-001.json"
create_dataset_bio.py
script:source activate myenv && python "create_dataset_bio.py" "<your_path>/storage/FASTA_small/smallfasta1.fasta" "<your_path>/storage/FASTA_small/smallfasta2.fasta" "<your_path>/storage/tokenizer-001.json" -c 200 -o "<your_path>/storage/dataset-bio-003"
create_embedding_bio_sp.py
script:source activate myenv && python "create_embedding_bio_sp.py" -i "<your_path>/storage/dataset-bio-003/train.csv" -t "<your_path>/storage/tokenizer-001.json" -o "<your_path>/storage/embeddings-bio-002"
Expected behavior:
Expected create_embedding_bio_sp.py
script to process the dataset and generate the embeddings without errors.
Error message:
TypeError: 'float' object is not subscriptable
Suggested fix:
Unclear. Issue appears to be in utils.py
at line 895, in <lambda>
:
sp = data[column].apply(lambda x: x[1:-1].replace("'", "").split())
Describe the bug
During build, this error is received.
Error while loading conda entry point: conda-libmamba-solver (libarchive.so.19: cannot open shared object file: No such file or directory)
To Reproduce
Please provide a minimal reproducible example with all steps to reproduce the behaviour before submitting an issue:
Run build (see workflow file)
Expected behavior
Build correctly.
Suggested fix
Possible fix
So far attempted to install libarchive==3.6.2
. This was ineffective.
Screenshots
If applicable, add screenshots to help explain your problem.
Describe the bug
It seems that the 'ray-default' channel is causing issues during the environment creation process. The error indicates that the package metadata for 'ray-default' is not accessible, resulting in a 404 error.
To Reproduce
Encountered issues while attempting to set up the GenomeNLP environment using Miniconda and Mamba.
Below, I have outlined the steps I followed, the errors encountered, and the attempted solutions:
Step 1: Miniconda Installation
Step 2: Mamba Installation
Here I encountered the following error due to an inaccessible channel ray-default: HTTP 404 NOT FOUND for channel ray-default
Attempted Solution for the above error :
This time Mamba installation succeeded.
Step 3: Creating GenomeNLP Environment
I Attempted Solution:
Then again I tried manually installing 'ray-default' with the command: pip install ray-default==1.13.0
Again faced error which says:
ERROR: Could not find a version that satisfies the requirement ray-default==1.13.0 (from versions: none)
ERROR: No matching distribution found for ray-default==1.13.0
System Information
System used :
Model Name: MacBook Air
Model Identifier: Mac14,2
Chip: Apple M2
Total Number of Cores: 8 (4 performance and 4 efficiency)
Memory: 16 GB
System Firmware Version: 10151.41.12
Conda version: 24.3.0
Python version: 3.9.19
Mamba version: 1.5.8
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.