Skip to content
Snippets Groups Projects
Commit c92c56b4 authored by Riccardo Boero's avatar Riccardo Boero :innocent:
Browse files

Refactor workflow runner

parent a3ca9964
No related branches found
No related tags found
No related merge requests found
"""
Script: FACT_compute_dataset.jl
This script is part of the FACT data processing toolkit and is responsible for executing data processing workflows defined in TOML format. It utilizes the FACTWorkflowManager module to parse the workflow definition and manage the execution of tasks as specified.
Usage:
julia --project=@. src/FACT_compute_dataset.jl <project_filename>
Arguments:
<project_filename> The filename of the project configuration in TOML format. This file should exist within the 'projects' directory of the current project and must include the .TOML file extension.
Functionality:
- Parses the specified TOML file to extract workflow definitions, including tasks and their dependencies.
- Executes the workflow using the FACTWorkflowManager, processing each task according to its definition and the specified execution order.
- Provides error handling to gracefully deal with issues in workflow execution, including missing files, incorrect definitions, and runtime errors within tasks.
Dependencies:
- Requires the FACTWorkflowManager module to be available and correctly configured in the current Julia environment.
- Assumes the presence of a 'projects' directory within the current working directory, where TOML project files are stored.
Note:
This script is designed to be run from the command line and expects a single argument: the path to the project configuration file relative to the 'projects' directory.
"""
using FACTWorkflowManager
# Main method to parse filename and use WorkflowManager
function main()
if length(ARGS) != 1
println("Usage: julia --project=@. src/FACT_compute_dataset.jl <project_filename>")
println("Where <project_filename> is the .TOML file (extension included) in the 'projects' directory.")
return
end
filename = joinpath("./projects", ARGS[1])
runWorkflow(filename)
end
include("WorkflowRunner.jl")
if abspath(PROGRAM_FILE) == @__FILE__
main()
end
\ No newline at end of file
......@@ -31,13 +31,15 @@ module FACT_data_project
using FACTWorkflowManager, FACTResultsIO
using TOML, DataFrames, JSON, CSV, ReadStatTables, Parquet, SQLite
include("WorkflowRunner.jl")
include("cmd/WorkflowRunner.jl")
export runWorkflow
include("cmd/DownloadDataset.jl")
export download_dataset
include("DataFrameManager.jl")
export get_joined_results_dataset
include("ResultsReader.jl")
include("FileWriter.jl")
export runWorkflow, get_joined_results_dataset
include("cmd/DownloadDataset.jl")
export download_dataset
end
\ No newline at end of file
"""
Script: FACT_download_dataset.jl
download_dataset(project_filename::String, output_filename::String)
This script facilitates the downloading and formatting of datasets based on workflow configurations defined in TOML format. It leverages the FACTResultsIO and other utility modules (such as TOML, DataFrames, JSON, CSV, StatFiles, Parquet, SQLite) to fetch, process, and save data to a specified format and location.
Usage:
julia --project=@. src/FACT_download_dataset.jl <project_filename> <output_filename>
Downloads and processes a dataset defined in a project file and saves the resulting dataset to an output file in the specified format.
Arguments:
<project_filename> The filename of the project configuration in TOML format. This file should be located within the 'projects' directory of the current project and must include the .TOML file extension.
<output_filename> The filename for the output, including the desired file extension. The file will be saved in the 'data' directory. Supported extensions include '.json', '.csv', '.dta', '.parquet', and '.sqlite'.
Functionality:
- Parses the specified project configuration file to extract and execute dataset retrieval workflows.
- Connects to a results container to fetch the required data.
- Saves the fetched data in the specified format to the designated output file.
- Supports multiple data serialization formats, allowing for flexible data storage solutions.
Dependencies:
- Requires FACTResultsIO for handling dataset retrieval.
- Utilizes various data manipulation and serialization libraries (TOML, DataFrames, JSON, CSV, StatFiles, Parquet, SQLite) for processing and saving the data.
Note:
This script expects two arguments: the path to the project configuration file (relative to the 'projects' directory) and the name of the output file (to be saved in the 'data' directory). It is designed for command-line execution and provides detailed usage instructions if the required arguments are not provided.
- `project_filename::String`: The name of the project file (with extension) located in the `projects` directory. This file defines the dataset to be downloaded.
- `output_filename::String`: The name of the output file (with extension) where the processed dataset will be saved. This file will be created in the `data` directory.
Workflow:
1. Constructs full paths for the input project file and the output file.
2. Logs into the registry and retrieves an authentication token.
3. Starts a results container and establishes a connection to retrieve data.
4. Processes the dataset defined in the project file using the connection.
5. Writes the processed dataset to the specified output file in the desired format.
6. Stops the results container after the operation is complete.
Supported Output Formats:
- `.json`: Saves the dataset as a JSON file.
- `.csv`: Saves the dataset as a CSV file.
- `.dta`: Saves the dataset in Stata format.
- `.parquet`: Saves the dataset as a Parquet file.
- `.sqlite`: Saves the dataset to an SQLite database.
Example Usage:
This example downloads a dataset defined in `example_project.toml` and saves it as `output.parquet`:
> download_dataset("example_project.toml", "output.parquet")
"""
function download_dataset(project_filename::String, output_filename::String)
filename = joinpath("./projects", project_filename)
outputFilename = joinpath("./data", output_filename)
......
File moved
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment