Folder Structure

A standardized folder structure supports the users to know where data, code, documentation, reports and much more are located and where elements should be saved to. The predefined STAMP 4 NLP folder structure looks as follows:

root                     <- Will be named as your project name with hyphens
|--data
   |--raw
   |--interim
   |--processed
|--docs
   |--build
   |--source
|--models
|--notebooks
   |--exploratory
   |--reports
|--{project_code}        <- Will be named as your project name with underscores
   |--api
   |--preprocessing
   |--scripts
   |--configuration.py
   |--main.py
|--test
   |--test_files
      |--test_data
      |--test_models
|--pypoetry.toml
|--README.md
|--.dvcignore
|--.gitignore
|--.gitlab-ci.yml

Short description for the folders and files:

  • data
    • raw: Unprocessed and unchanged data.

    • interim: Processed data but not ready for training.

    • processed: Processed data, that is ready for training.

  • docs
  • models: Trained or loaded models are placed here.

  • notebooks
    • exploratory: Notebooks for analysis of data or models.

    • reports: Notebooks for showcases, communication with stakeholders.

  • {project_code}:
  • test: Tests for your project code.

  • pypoetry.toml: Configuration file for poetry. Add dependencies, scripts and authors here.

  • README.md: Getting Started in short.

  • .dvcignore: Files and directories ignored by DVC. DVC is for data versioning.

  • .gitignore: Files and directories ignored by git. Git is for code versioning.

  • .gitlab-ci.yml: Build pipeline for GitLab.