Corpus Composition¶
[Task of Domain Discovery and Data Selection]
Purpose¶
Ensure that the Corpus Metric Requirements are met and the corpus consists of representatives to create best possible results in training.
Description¶
Selection of representative canditates from the data source and saving them in a reasonable corpus format.
Steps¶
Review Corpus Metric Requirements
Take representatives from Data Sources
Save corpus in the specified storage and version it
Evaluate Corpus Metric Requirements
Edit current corpus, if needed, to fulfill Corpus Metric Requirements