Folder Structure¶

A standardized folder structure supports the users to know where data, code, documentation, reports and much more are located and where elements should be saved to. The predefined STAMP 4 NLP folder structure looks as follows:

root                     <- Will be named as your project name with hyphens
|--data
   |--raw
   |--interim
   |--processed
|--docs
   |--build
   |--source
|--models
|--notebooks
   |--exploratory
   |--reports
|--{project_code}        <- Will be named as your project name with underscores
   |--api
   |--preprocessing
   |--scripts
   |--configuration.py
   |--main.py
|--test
   |--test_files
      |--test_data
      |--test_models
|--pypoetry.toml
|--README.md
|--.dvcignore
|--.gitignore
|--.gitlab-ci.yml

Short description for the folders and files:

data
- raw: Unprocessed and unchanged data.
- interim: Processed data but not ready for training.
- processed: Processed data, that is ready for training.
docs
- build: Generated documentation. See Generate Documentation
- source: Source files for the documentation. See Write Your Documentation
models: Trained or loaded models are placed here.
notebooks
- exploratory: Notebooks for analysis of data or models.
- reports: Notebooks for showcases, communication with stakeholders.
{project_code}:
- api: Predefined REST Server. See Start REST Server
- preprocessing: Preprocessing to transform the raw data to interim and processed format.
- scripts: Scripts for poetry run usage. Predefined are gen_doc and server (Getting Started). To add and create own scripts see https://python-poetry.org/docs/pyproject/#scripts.
- configuration.py: See Convention over Configuration
- main.py
test: Tests for your project code.
pypoetry.toml: Configuration file for poetry. Add dependencies, scripts and authors here.
README.md: Getting Started in short.
.dvcignore: Files and directories ignored by DVC. DVC is for data versioning.
.gitignore: Files and directories ignored by git. Git is for code versioning.
.gitlab-ci.yml: Build pipeline for GitLab.