For details on reproducing our experiments, see Experiment Reproducibility Guide.
We recommend using the Acute Myeloid Leukemia dataset as a DEMO to run our model, as it is the smallest dataset used in our paper. See Running the AML Demo.
Our documentation includes:
The single-cell Mixed Effects Deep Autoencoder Learning (scMEDAL) framework provides a robust approach to analyze single-cell RNA sequencing (scRNA-seq) data. By disentangling batch-invariant from batch-specific signals, scMEDAL offers a more interpretable representation of complex datasets.
General structure of the repository:
scMEDAL_for_scRNAseq/
|-- Experiments/ # Scripts and notebooks for experiments
|-- scMEDAL/ # Main package
| |-- __init__.py
| |-- models/ # Model definitions
| | |-- __init__.py
| | |-- scMEDAL/
| | |-- models/
| |-- utils/ # Utilities for preprocessing, training, etc.
| | |-- __init__.py
|
|-- scMEDAL_env/ # Environment YAML files
|-- setup.py # Package setup
scMEDAL
Clone repository
Setup and activate your environment
conda activate your_env_name
Install in editable mode
Navigate to the scMEDAL_for_scRNAseq
directory and install:
cd /path/to/scMEDAL_for_scRNAseq
pip install -e .
Verify installation
from scMEDAL.utils import your_function
print("scMEDAL is ready to use!")
To handle dependency conflicts, scMEDAL
uses three separate Conda environments:
genomaps_env
: For generating Genomaps.preprocess_and_plot_umaps_env
: For data preprocessing and UMAP visualization.run_models_env
: For data splitting and running models.Navigate to the scMEDAL_env
directory:
cd /path/to/scMEDAL_for_scRNAseq/scMEDAL_env
Create each environment:
conda env create -f genomaps_env.yaml
conda env create -f preprocess_and_plot_umaps_env.yaml
conda env create -f run_models_env.yaml
Activate the desired environment:
conda activate genomaps_env
or
conda activate preprocess_and_plot_umaps_env
or
conda activate run_models_env
Match the Environment to the Task
Use the Conda environment that corresponds to the specific script or task you need to run.
Install Required Packages
Make sure that all relevant environments have the scMEDAL
package installed (see Step 2 above for instructions).
Configure Your Slurm Scripts
When submitting jobs via Slurm, load the appropriate Conda environment before executing the script. For example:
# For running models
source activate /path/to/run_models_env
# For preprocessing and plotting UMAPs
source activate /path/to/preprocess_and_plot_umaps_env
# For generating genomaps
source activate /path/to/genomaps_env
By following the steps above, you ensure each script is run in the correct environment, with the necessary dependencies in place.
This setup will allow you to run our models in the Healthy Heart, ASD and AML datasets. Experiment Folder Structure: Each dataset-specific experiment follows a standard directory layout:
scMEDAL_for_scRNAseq/
|-- Experiments/
|-- data/ # Download and Setup your data folders
|-- outputs
|-- <dataset_name>/
|-- preprocessing/
| |-- 5fold_cross_val/
| | |-- create_splits.ipynb
| | |-- check_splits.ipynb
| | |-- config_split_paths.py
| |-- preprocess_datasetname.py
| |-- batch_preprocess_dataset.sh
| |-- preprocess_datasetname.ipynb
|-- run_models/
| |-- AE/
| |-- AEC/
| |-- scMEDAL-FEC/
| |-- scMEDAL-FE/
| |-- scMEDAL-RE/
| |-- compare_results/
| | |-- clustering_scores/
| | |-- genomaps/
| | |-- umap_plots/
| |-- MEC/
| |-- target/
| |-- scMEDAL-FEandscMEDAL-RE_latent/
| |-- scMEDAL-FE/
| |-- PCA_latent/
|-- paths_config.py
data/
outputs/
import outputs_path
from paths_config.py
)datasetname/
For instructions on setting up experiments, see How2SetupYourExpt.
Each model directory contains a model_config.py
file that specifies settings and paths. For example:
Note: You can update the number of epochs you want to run by modifying the epochs
parameter in the dictionary:
train_model_dict = {
"epochs": 2, # For testing; for full experiments, use a larger value (e.g., 500)
# "epochs": 500, # Number of training epochs used in our experiments
}
To set up the datasets for your experiments, follow these steps:
/Experiments/data
.
Healthy Human Heart
/Experiments/data/HealthyHeart_data
HealthyHeart_data
if it does not already exist.Autism Spectrum Disorder (ASD)
/Experiments/data/ASD_data
ASD_data
if it does not already exist.Acute Myeloid Leukemia (AML)
/Experiments/data/AML_data
AML_data
if it does not already exist.You can run AE, AEC, scMEDAL-FE, scMEDAL-FEC, or scMEDAL-RE independently. PCA can be generated simultaneously by setting "get_pca": True
in config.py
.
The MEC model requires latent outputs from one of the above models; it cannot run independently.
Run All Folds Locally:
python run_modelname_allfolds.py
Submit Jobs via Slurm:
sbatch sbatch_run_modelname.sh
For detailed instructions, see How2RunYourExpt.
For more information about output files and their contents, refer to ExperimentOutputs.
For guidance on analyzing and interpreting model outputs, see How2AnalyzeYourModelOutputs.