C2S-Scale
The setup is as usual,
git clone https://github.com/vandijklab/cell2sentence.git
cd cell2sentence/
module load python/3.9.12/gcc/pdcqf4o5
python -m venv cell2sentence
source cell2sentence/bin/activate
make install
pip install flash-attn==1.0.4 --no-build-isolation
Usage notes on tutorials 1-3, posted at https://github.com/vandijklab/cell2sentence/issues/14:
1). src/cell2sentence/utils.py
, line 280: "r_squared": [r_squared_score.item()], –> "r_squared": [r_squared_score];
reconstructed_adata = anndata.AnnData(
X=all_reconstructed_expression_vectors,
obs=adata.obs.copy(),
var=adata.var.copy()
)
reconstructed_adata
==>
from scipy.sparse import csr_matrix
# Convert csr_array to csr_matrix
X_converted = csr_matrix(all_reconstructed_expression_vectors)
# Create the AnnData object
reconstructed_adata = anndata.AnnData(
X=X_converted,
obs=adata.obs.copy(),
var=adata.var.copy()
)
2). cell_type_prediction_model_path
needs to be changed, e.g.,
# Define CSModel object
cell_type_prediction_model_path = "vandijklab/C2S-Pythia-410m-cell-type-prediction"
save_dir = "../C2S_Files_Syed/c2s_api_testing/csmodel_tutorial_2"
save_name = "cell_embedding_prediction_pythia_410M_1"
csmodel = cs.CSModel(
model_name_or_path=cell_type_prediction_model_path,
save_dir=save_dir,
save_name=save_name
)
3). Note that eval_strategy="steps"
instead of evaluation_strategy="steps"
below,
train_args = TrainingArguments(
bf16=True,
fp16=False,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
learning_rate=1e-5,
load_best_model_at_end=True,
logging_steps=50,
logging_strategy="steps",
lr_scheduler_type="cosine",
num_train_epochs=5,
eval_steps=50,
eval_strategy="steps",
save_steps=100,
save_strategy="steps",
save_total_limit=3,
warmup_ratio=0.05,
output_dir=output_dir
)