Folder Structure¶
A standardized folder structure supports the users to know where data, code, documentation, reports and much more are located and where elements should be saved to. The predefined STAMP 4 NLP folder structure looks as follows:
root <- Will be named as your project name with hyphens
|--data
|--raw
|--interim
|--processed
|--docs
|--build
|--source
|--models
|--notebooks
|--exploratory
|--reports
|--{project_code} <- Will be named as your project name with underscores
|--api
|--preprocessing
|--scripts
|--configuration.py
|--main.py
|--test
|--test_files
|--test_data
|--test_models
|--pypoetry.toml
|--README.md
|--.dvcignore
|--.gitignore
|--.gitlab-ci.yml
Short description for the folders and files:
- data
raw: Unprocessed and unchanged data.
interim: Processed data but not ready for training.
processed: Processed data, that is ready for training.
- docs
build: Generated documentation. See Generate Documentation
source: Source files for the documentation. See Write Your Documentation
models: Trained or loaded models are placed here.
- notebooks
exploratory: Notebooks for analysis of data or models.
reports: Notebooks for showcases, communication with stakeholders.
- {project_code}:
api: Predefined REST Server. See Start REST Server
preprocessing: Preprocessing to transform the raw data to interim and processed format.
scripts: Scripts for
poetry run
usage. Predefined aregen_doc
andserver
(Getting Started). To add and create own scripts see https://python-poetry.org/docs/pyproject/#scripts.configuration.py: See Convention over Configuration
main.py
test: Tests for your project code.
pypoetry.toml: Configuration file for poetry. Add dependencies, scripts and authors here.
README.md: Getting Started in short.
.dvcignore: Files and directories ignored by DVC. DVC is for data versioning.
.gitignore: Files and directories ignored by git. Git is for code versioning.
.gitlab-ci.yml: Build pipeline for GitLab.