Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to duplicate scenarios #397

Open
FlorianJacta opened this issue Oct 3, 2022 · 16 comments · May be fixed by #2373
Open

Possibility to duplicate scenarios #397

FlorianJacta opened this issue Oct 3, 2022 · 16 comments · May be fixed by #2373
Assignees
Labels
Core: 🎬 Scenario & Cycle 🆘 Help wanted Open to participation from the community ✨New feature 🟧 Priority: High Must be addressed as soon
Milestone

Comments

@FlorianJacta
Copy link
Member

FlorianJacta commented Oct 3, 2022

What would that feature address
I want to duplicate a scenario. That means creating a new scenario and the related entities. The data nodes of the new scenario should have already been written and populated with the same data as the first scenario.

Motivations:

  1. The possibility of starting with a scenario already set up represents an important time saving when many parameters or input data must be set before running a scenario.

  2. When output data nodes of long tasks (for instance, formatting, preprocessing, or training) are already computed in a scenario, we want to duplicate the data and benefit from the task-skipping feature to save computation and time.

Description of the ideal solution
A new Scenario API should be exposed to duplicate a scenario. It should accept the new name and an optional list of data nodes to copy (by default all the scenario-scoped data nodes should be copied)

@jrobinAV jrobinAV added 🆘 Help wanted Open to participation from the community 🟩 Priority: Low Low priority and doesn't need to be rushed labels Dec 1, 2022
@jrobinAV jrobinAV added 🟨 Priority: Medium Not blocking but should be addressed and removed 🟩 Priority: Low Low priority and doesn't need to be rushed labels Jul 3, 2023
@jrobinAV jrobinAV added good first issue New-contributor friendly and removed 🆘 Help wanted Open to participation from the community labels Oct 9, 2023
@jrobinAV jrobinAV added 🟧 Priority: High Must be addressed as soon and removed 🟨 Priority: Medium Not blocking but should be addressed labels Oct 22, 2023
@jrobinAV jrobinAV transferred this issue from Avaiga/taipy-core Nov 13, 2023
joaoandre-avaiga pushed a commit that referenced this issue Nov 23, 2023
* feat: force to run core service even on development execution mode

* feat: warn if submit a pipeline when Core service is not run yet
@trgiangdo trgiangdo added this to the Community 3.1 milestone Dec 4, 2023
dinhlongviolin1 pushed a commit that referenced this issue Dec 4, 2023
@trgiangdo trgiangdo self-assigned this Jan 23, 2024
@trgiangdo
Copy link
Member

There is a problem that needs to be clarified for this feature.

If in the original scenario, there is some data nodes that has scope <= SCENARIO, we are going to need to duplicate the data.

  • If the data node is a file-based one (pickle, csv, excel, ...), if the user provides an explicit path in the original data node, what would be the path of the duplicated data node?
  • If the data node is SQL-based one, what would be the duplicated read and write query? The same question for other database-based data node, because 2 data nodes should not point to the same table, collection, ...

Please let me know what you think @FlorianJacta

@trgiangdo trgiangdo removed their assignment Feb 7, 2024
@trgiangdo trgiangdo removed the good first issue New-contributor friendly label Feb 7, 2024
@jrobinAV jrobinAV added 🟨 Priority: Medium Not blocking but should be addressed 💬 Discussion Requires some discussion and decision and removed 🟧 Priority: High Must be addressed as soon labels Jul 18, 2024
@jrobinAV
Copy link
Member

jrobinAV commented Aug 26, 2024

I propose to reformulate the description of this ticket. Please let me know what you think.

What would that feature address
When new scenarios are created, we could duplicate some data node's data to initialize the new scenario.
The motivation would be not to have to execute the tasks if it is not necessary.

Example:
Let's assume the data node B is scenario-scoped. We have one preprocessing task, T1, that is time-consuming. It reads A and writes B. We have a task T2 that reads B and C and writes D. We want to duplicate the scenario, keeping the data of A and B so we don't need to re-compute them. We just want to have a new scenario to vary the C data and recompute an alternative of D.

Scenario 1:

A --> T1 --> B ----> T2 --> D
             C --/

Scenario 2 as a duplication:

A' --> T1 --> B' ----> T2 --> D'
              C' --/

At the scenario 2 creation, we want the following

A.read() == A'.read()
B.read() == B'.read()

Description of the ideal solution
Expose a new API to duplicate a scenario providing the list of data nodes data to copy.

@FlorianJacta
Copy link
Member Author

The objective of this issue is to implement both a technical and functional feature.

Functional: From the user's perspective, duplicating a scenario is a logical and valuable action. After conducting an extensive analysis and modifying parameters X, Y, and Z, I run my code to observe the outcomes. If I want to see the impact of altering part of Y, I should be able to resubmit without redoing all my previous work. This process mirrors a common and intuitive workflow, akin to a "Save As" function that allows you to save a current scenario or results and then proceed with further analysis.

Technical: We aim to support this workflow while maintaining performance and user-friendliness. With a "Save As" option, it's important that the results aren't lost in the new scenario, and there should be a system to skip redundant operations since this is essentially duplicating a run that's already been completed.

I recognize the potential challenges with SQL read/write operations in this context. I don't have a definitive solution at this moment. Perhaps allowing users to select which data nodes to copy could help, but that might complicate the natural workflow I initially envisioned. Ultimately, this feature is more akin to a "Save As" function than anything else.

@jrobinAV
Copy link
Member

@FlorianJacta Thanks, that is more clear.

@jrobinAV jrobinAV added 🆘 Help wanted Open to participation from the community hacktoberfest hacktoberfest issues hacktoberfest - 300💎💎💎 Issues rewarded by 300 points and removed 💬 Discussion Requires some discussion and decision labels Sep 25, 2024
@AnujSaha0111
Copy link

Please assign me. I want to work on it

@jrobinAV
Copy link
Member

jrobinAV commented Oct 3, 2024

@AnujSaha0111 Thank you for your help. You are assigned!

@AnujSaha0111
Copy link

AnujSaha0111 commented Oct 3, 2024

I made a PR related to this issue 7 hours ago, but it is failing, can you please check it
#1896

@quest-bot quest-bot bot added the ⚔️ Quest Tracks quest-bot quests label Oct 7, 2024
Copy link

quest-bot bot commented Oct 7, 2024

New Quest! image New Quest!

A new Quest has been launched in @Avaiga’s repo.
Merge a PR that solves this issue to loot the Quest and earn your reward.


Some loot has been stashed in this issue to reward the solver!

🗡 Comment @quest-bot embark to check-in for this Quest and start solving the issue. Other solvers will be notified!

⚔️ When you submit a PR, comment @quest-bot loot #397 to link your PR to this Quest.

Questions? Check out the docs.

Copy link
Contributor

This issue has been labelled as "🥶Waiting for contributor" because it has been inactive for more than 14 days. If you would like to continue working on this issue, please add another comment or create a PR that links to this issue. If a PR has already been created which refers to this issue, then you should explicitly mention this issue in the relevant PR. Otherwise, you will be unassigned in 14 days. For more information please refer to the contributing guidelines.

@github-actions github-actions bot added the 🥶Waiting for contributor Issues or PRs waiting for a long time label Oct 23, 2024
Copy link
Contributor

github-actions bot commented Nov 7, 2024

This issue has been unassigned automatically because it has been marked as "🥶Waiting for contributor" for more than 14 days with no activity.

@github-actions github-actions bot removed the 🥶Waiting for contributor Issues or PRs waiting for a long time label Nov 7, 2024
@FlorianJacta FlorianJacta added 🟧 Priority: High Must be addressed as soon and removed 🟨 Priority: Medium Not blocking but should be addressed labels Nov 13, 2024
@jrobinAV jrobinAV removed hacktoberfest hacktoberfest issues hacktoberfest - 300💎💎💎 Issues rewarded by 300 points ⚔️ Quest Tracks quest-bot quests labels Nov 18, 2024
@trgiangdo
Copy link
Member

The "Save As" option sounds like the export() API to me, and then import the scenario again.

However, the problem with data node new paths and database still persists

@toan-quach
Copy link
Member

@jrobinAV Should we copy a new scenario by using the current scenario but just save it as a different one under a new id?

@toan-quach
Copy link
Member

@jrobinAV @FlorianJacta
hmmm ok after spending some time to thinkg over this ticket, I understand this ticket as we want to duplicate a scenario so that:

  1. save time creating a new one
  2. reduce execution time by reusing calculate result

I have several questions regarding this:

  1. If we want to reuse the data referred by data nodes to calculate certain things, do we reuse all of them?
  • If we reuse all of them, then the result will likely be overwritten. I doubt that this is desirable.
  • If we don't, how should we determine which data/data nodes we should refer back to, and which we should create anew.
  1. If we want to create new data nodes that refer to old data, do we duplicate this old data as well?
  • If yes, there are the issues of naming we need to consider (do we want to name the new data file as output_datanode_id.csv? for example)
  • If we, we risk overwriting the result as mentioned above
  1. Can't we just use the scenario config to create a new scenario? If we want to reuse some of the output datanodes, why don't we utilize the scope of the data nodes, making it either CYCLE, or GLOBAL?

@jrobinAV
Copy link
Member

Good questions!

We want to create new data nodes only for scenario-scoped data nodes.
We should copy the file and provide a new name for the file-based data nodes.
We should create a new table with the right schema for database-based data nodes.
We may delay the implementation for query-based data nodes (SQLDataNodes) as they seem more complex.

@toan-quach toan-quach self-assigned this Dec 26, 2024
@toan-quach toan-quach linked a pull request Dec 27, 2024 that will close this issue
5 tasks
Copy link
Contributor

github-actions bot commented Jan 9, 2025

This issue has been labelled as "🥶Waiting for contributor" because it has been inactive for more than 14 days. If you would like to continue working on this issue, please add another comment or create a PR that links to this issue. If a PR has already been created which refers to this issue, then you should explicitly mention this issue in the relevant PR. Otherwise, you will be unassigned in 14 days. For more information please refer to the contributing guidelines.

@github-actions github-actions bot added the 🥶Waiting for contributor Issues or PRs waiting for a long time label Jan 9, 2025
@toan-quach
Copy link
Member

#2373

@toan-quach toan-quach removed the 🥶Waiting for contributor Issues or PRs waiting for a long time label Jan 23, 2025
@toan-quach toan-quach linked a pull request Jan 23, 2025 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core: 🎬 Scenario & Cycle 🆘 Help wanted Open to participation from the community ✨New feature 🟧 Priority: High Must be addressed as soon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants