To ingest and permanently store all reports from Konveyor, using the data to determine and store solutions for fixed incidents, and to support future data extraction and fine-tuning.
The process involves several key steps to ensure comprehensive analysis and storage of incidents and their solutions:
- Ingestion of Analysis Reports
- Incident Fix Detection
- Diff Generation and Storage
- Solution Validation
- Long-term Data Storage
All analysis reports generated by Konveyor are ingested and stored permanently. This ensures that all data is retained for future reference and analysis.
The recorded incidents in each analysis report, along with commit information, are used to determine whether an incident was fixed between two reports. This comparison helps identify changes and resolved incidents. Currently this is done naively, but in the future there should be additional processing done to ensure that incidents aren't just moving elsewhere in the file and remaining unfixed, and to ensure the diff is relevant to the documented issue.
When an incident is detected as fixed:
- The repository is checked out to the relevant commit.
- The original file is compared (diffed) against the updated file from the new analysis report.
- The generated diff is stored as the solution to the incident.
In the longer term, additional layers of processing will be implemented to ensure that the generated solutions are actually relevant to the identified problems. This validation step will improve the accuracy and usefulness of stored solutions.
Analysis reports are stored permanently. Solutions are stored in a separate portion of the database for use with Retrieval-Augmented Generation (RAG) prompts. Solutions are considered recomputable from the original reports. Permanent storage of analysis reports allows for:
- Future data extraction, if additional data is identified for extraction.
- Using the data for fine-tuning models, providing flexibility for ongoing improvements.
- Comprehensive Data Retention: Permanent storage of all analysis reports ensures that no data is lost, supporting future analysis and extraction needs.
- Improved Incident Resolution: By detecting fixed incidents and storing relevant solutions, the system continually improves its ability to resolve incidents.
- Enhanced Model Training: Stored data can be used for fine-tuning models, enhancing their accuracy and performance over time.
- Future-proofing: The approach ensures that any new insights or extraction techniques can be applied to the entire history of analysis reports.