to

github repo for datasets

Sparisoma Viridi
1 min read ·

Github repo for machine learning datasets as a pseudo data lake

intro

There is load step in ETL, where well-structure data is loaded to permanen storage system 1. To simulate that a GitHub repository is created as a data lake 2 and transformed data in XLSX format is uploaded to it.

The repository is https://github.com/dudung/datasets with the initial category is stress-strain.

notes


  1. Khurram Haider, “What is ETL? – Extract, Transform, Load Explained”, Astera, 25 Mar 2024, url https://www.astera.com/type/blog/etl/ [20240415]. ↩︎

  2. Chen Cuello, “What is Data Lake? Definition, Benefits, And Best Practices”, Revery, 26 May 2023, url https://rivery.io/data-learning-center/data-lake-guide/ [20240415]. ↩︎

Tags: