Untitled

Definitions:
- 'source' is a file or API backend published on the web by Rosstat or other agency
  - 'clean source' is something we can truct for its structure, usually an API
  - 'messy source' is something that changes once in a while, eg Word files
- 'scrapper' is a program that downloads the data without transforming it (download files, unpack from zip/rar)
- 'parser' is a program that reads raw data and makes 'processed output'
- 'processed output' is canonical result of parsing, importable to production database

In our pipleine:
- Scrapper loads Source to Raw Database
- Parser reads Source from Raw Database layer and produces Processed Output
- Processed Output is imported into Production Database

Sometimes a parser/can handle a source itself well, especially if it is an API.
This way it can bypass qurying the Raw Database, right?

Question:
- need clarification about Raw Database layer - do we always need it?


Sometimes