add avr xml generator

bergercookie · Feb 8, 2025 · 32912c6 · 32912c6
1 parent e925830
commit 32912c6
Show file tree

Hide file tree

Showing 31 changed files with 2,787 additions and 0 deletions.
diff --git a/docs_store/opcodes/avr_xml_gen/.gitignore b/docs_store/opcodes/avr_xml_gen/.gitignore
@@ -0,0 +1,3 @@
+intermediate_data/
+*.log
+__pycache__
diff --git a/docs_store/opcodes/avr_xml_gen/AVR-Instruction-Set-Manual-DS40002198A.pdf b/docs_store/opcodes/avr_xml_gen/AVR-Instruction-Set-Manual-DS40002198A.pdf
diff --git a/docs_store/opcodes/avr_xml_gen/README.md b/docs_store/opcodes/avr_xml_gen/README.md
@@ -0,0 +1,160 @@
+This is a program for generating avr instructions xml
+
+The rogram uses the avr datasheet, which you can find under [AVR-Instruction-Set-Manual-DS40002198A.pdf](https://ww1.microchip.com/downloads/en/DeviceDoc/AVR-InstructionSet-Manual-DS40002198.pdf)
+
+The whole programm is split into 4 stages:
+
+1. Data markers
+2. Data extraction
+3. Data analysis
+4. Xml generation
+
+For gui, [tkinter](https://docs.python.org/3/library/tkinter.html) is used.
+
+# 1. Data managinig and structuring
+
+Stored in `core/data_processing`.
+There are series of unfortunate titles such as `data_processing` or `data_processing.data_management`,
+because they don't point to the exact purpose of classes. This can be changed in
+the future, but so far, fate has decided.
+
+## 1.1 Storing instruction and its forms
+
+Each instruction form has several aspects. They are:
+
+1. Mnemonic ('name' in xml)
+2. Version (version of avr architecture)
+3. Description ('summary' in xml)
+4. Operands
+5. SREG (status registers)
+6. Opcode
+
+I extracted aspects 1, 3, and 4 from tables and 1, 2, 5, and 6 from chapters (datasheet,
+chapter 5). This is merged, so now 3 and 4 is stored as `table data`', while 5 and
+6  are stored as `chapter data`. Sometimes you can see subchapter instead of chapter
+because at the very beginning I wanted to name it 'subchapter'. In some cases, division
+into the chapter and subchapter makes sence (in the data extraction stage). Even
+though 2 (version) is extracted from a chapter, the version is chapter-data independent,
+and it was just convenient to extract versions from chapter.
+
+There is the class `data_management.InstructionForm` for storing `mnemonic`, `version`,
+`table_data` and `chapter_data`. It inherits `data_management.StrictDictionary`
+which is the same as default dictionary, but with fixed keys. This is helpful,
+as python points tells if there are wrong or missing keys.
+
+Instruction is stored as a dictionary with `str -> list[InstructionForm]`, where
+`str` is the instruction name.
+
+## 1.2 Context
+
+`data_management.Context` stores all data that could be used by stages. It stores
+instruction data markers, instruction data extracted from datasheet, parsed instruction
+data (instruction forms), and even the datasheet file name.
+
+Context has dictionary and works in the key-value concept, but with one feature:
+keys are data type. So it could store only one copy of particular data. This was
+better than a dictionary with str names or something else. During development, it
+became clear that this needed to be changed for Aliases.
+
+Some unique data (e.g `ExtractedInstructionData`, `ProcessedInstructionData`) could
+just be an `Alias`, but they would have been the same key of type `Alias` if I have
+left as it was. A future improvement could be to make context distinguish different
+aliases as keys.
+
+## 1.3 Ambiguous data
+
+In `ambiguous_data.py` you can see several default data types analogs (`dict`,
+`list`, `forzenset`) but with the 'Ambiguous' prefix.
+
+During the data extraction it became clear that it is useful to determine if a list
+or set has more than one object. It was too often so I decided to create the same
+classes but with a `is_ambiguous()` method and special output: it prints 'Ambiguous'
+if data is ambiguous. The dictionary is ambiguous if it has ambiguous value or key.
+The whole extracted instruction data is `AmbiguousDict[AmbiguousFrozen, AmbiguousList]`.
+
+## 1.4 Other random classes
+
+Observer and Subject:
+I used the observer pattern for stages, so I just shoved these classes here :)
+
+DataManager:
+Is used by stage (2). It has `request()` method. The implementation depends on a
+particular class. The biggest example of this is in `core/stages/data_analysis/instruction_forms_manager.py`.
+
+# 2. Stages
+
+All related to stages is stored in `core/stages`.
+
+## 2.1 The main tasks of the stages
+
+1. The data markers stage customizes the instruciton data markers that determine
+   what is useful data in the datasheet. It isn't used at this time because data extraction
+   stage isn't completed. Details are written in '2 Data extraction'. If you want to
+   see what is created for data markers, replace `app.after(100, stage2.execute)` with
+   `app.after(100, stage1.execute)`. This is a demonstration of a field selector. It
+   can catch pdf objects like images, tables and words and access to all of their attributes.
+
+2. The data extraction stage uses data markers created in first stage to extract
+   instruction data from the datasheet. This stage doesn't have a modular structure
+   and doesn't use markers, so it should be remade.
+
+3. Data analysis converts extracted data for direct xml compilation. This data is
+   a dictionary with instruction names as keys and lists of instruction forms as values.
+   Datasheet data isn't always ambiguous, which can be resolved by two ways: manually
+   by selecting needed options or programmatically by adding code to handle specific
+   cases. The first option is preferred.
+
+4. Xml generation speaks for itself. It uses processed data from the data analysis
+   stage.
+
+## 2.2 Stage concept itself
+
+Basic stage classes stored in `stage.py`
+
+Summary:
+
+The whole programm is split into stages. Each stage has next stage (it could be None),
+`execute()`, and `try_complete()` methods. It require `Context` (1.2) for initialization.
+
+At the very beginning we `execute()` the first stage. By default, the stage tries
+to be completed by `try_complete()` after the `<<Complete>>` event. If it is completed,
+the stage executes the next stage. Otherwise, its work is continued.
+
+`Stage` also has a `permanently_completed` variant. This is needed for `BidirectionStage`,
+which could go forwards and back. If `Stage` is permanently completed, it can't be
+`execute()`d.
+
+More details:
+In addition to the `Stage` class, there is also `StageTask`, `StageGUI` and `BidirectionalStage`.
+`StageTask` is used by `try_complete()` to check if completion conditions are satisfied
+by method `is_completed()`. It also requires `Context`.
+
+`StageGUI` is used if we need gui :). It has `enable()` and `disable()` methods.
+It requires `DataManager`. (1.4) `Stage.execute()` calls `StageGUIStage` GUI generates
+a `<<Complete>>` event. After generating this event, `Stage` executes `try_complete()`.
+
+In fact, `StageTask` and `StageGUI` are optional. See `xml_generation.py`.
+
+`BidirectionalStage` is the same as default stage, except that it also has the previous_stage.
+It inherits `Stage`.
+
+## 2.3 Stages implementation
+
+1. Data markers has one stage `SettingHeader`. It determines the height of the pdf
+   header. This is just a demonstration. For gui,`PDFRegionSelector`is used.
+
+2. Data extraction has one big process of data extraction, no more. As I mentioned
+   in 2.1, it could be cleaned up. It has gui with a 'waiting' caption.
+
+3. Data analysis has `InstructionBuilding` class. It is `BidirectionalStage`. An
+   `InstructionBuilding` instance is created for each instruction. It automatically
+   generates one singular instruction form and completes itself, if instruction data
+   isn't ambiguous (not `is_ambiguous()`, see `AmbiguousData` in 1.3), otherwise its
+   GUI is opened. In the GUI, the user creates forms and selects correct form aspects.
+   A future imrprovement could be adding saving for selected options. In `app.py` you
+   can see `InstructionBuildingInitializing`. It is needed for creating `InstructionBuilding`
+   instances.
+
+4. Xml generation has `XMLGeneration` which uses `InstructionXMLBuilder`. It is a
+   realy simple stage even without GUI and Task (task should be added to check if all
+   is good)
diff --git a/docs_store/opcodes/avr_xml_gen/app.py b/docs_store/opcodes/avr_xml_gen/app.py
@@ -0,0 +1,99 @@
+from core.logging import setup_logging
+setup_logging()
+
+from core.data_processing import Context, InstructionDataMarkers, SourceInfo, ExtractedInstructionsData, ProcessedInstructionsData
+from core.stages.data_markers import SettingHeader
+from core.stages.data_extraction import DataExtraction
+from core.stages.data_analysis import InstructionBuilding
+from core.stages.xml_generation import XMLGeneration
+from core.stages import Stage, BidirectionalStage, StageGUI, StageTask
+from core.data_processing.data_management import DataManager
+import tkinter as tk
+import os
+
+
+class InstructionBuildingInitializingTask(StageTask):
+    def is_completed(self) -> bool:
+        return True
+
+
+class InstructionBuildingInitializingGUI(StageGUI):
+    def __init__(self, master=None, cnf=..., **kwargs) -> None:
+        super().__init__(None, master, cnf, **kwargs)
+        self.__setup_ui()
+        self.__setup_bindings()
+
+    def __setup_ui(self):
+        self._label = tk.Label(self, text='Now we need to make up the forms\n' \
+                                          'Some instructions data is ambiguous\n' \
+                                          'In that case you need to make instruction froms manually'
+                               )
+        self._button = tk.Button(self, text='Resolve ambiguoity')
+        self._label.pack(side=tk.TOP, fill=tk.BOTH, expand=True)
+        self._button.pack(side=tk.BOTTOM, fill=tk.X, expand=True)
+
+    def __setup_bindings(self):
+        self._button.config(command=lambda: self.event_generate('<<Complete>>'))
+
+
+class InstructionBuildingInitializing(Stage):
+    def __init__(self, stages: set[Stage], context: Context, master=None, cnf=..., **kwargs) -> None:
+        super().__init__(context, master, cnf, **kwargs)
+        self.set_gui(InstructionBuildingInitializingGUI())
+        self.set_task(InstructionBuildingInitializingTask(self._context))
+        self._stages = stages
+
+    def execute(self) -> None:
+        extracted_data = self._context[ExtractedInstructionsData]
+        stages: list[BidirectionalStage] = []
+        for instruction_name in extracted_data.keys():
+            stages.append(InstructionBuilding(instruction_name, self._context))
+        for i in range(1, len(stages)):
+            stages[i-1].set_next(stages[i])
+            stages[i].set_previous(stages[i-1])
+        stages[-1].set_next(self._next_stage)
+        self._next_stage = stages[0]
+        self._stages.update(stages)
+        if self._gui is not None:
+            self._gui.enable(expand=True)
+
+
+class EndStageGUI(StageGUI):
+    def __init__(self, data_manager: DataManager | None = None, master=None, cnf=..., **kwargs) -> None:
+        super().__init__(data_manager, master, cnf, **kwargs)
+        self.label = tk.Label(self, text='Work is completed')
+        self.label.pack()
+
+
+class EndStage(Stage):
+    def __init__(self, context: Context, master=None, cnf=..., **kwargs) -> None:
+        super().__init__(context, master, cnf, **kwargs)
+        self._gui = EndStageGUI()
+
+
+def main():
+    app = tk.Tk()
+
+    context = Context()
+    context.record(ExtractedInstructionsData())
+    context.record(ProcessedInstructionsData())
+    source_info = SourceInfo()
+    source_info['pdf_path'] = os.path.abspath('./AVR-Instruction-Set-Manual-DS40002198A.pdf')
+    context.record(source_info)
+    context.record(InstructionDataMarkers())
+
+    stages: set[Stage] = set()
+    stage1 = SettingHeader(context)
+    stage2 = DataExtraction(context)
+    stage3 = InstructionBuildingInitializing(stages, context)
+    stage4 = XMLGeneration(context)
+    stage5 = EndStage(context)
+
+    stage1.set_next(stage2)
+    stage2.set_next(stage3)
+    stage3.set_next(stage4)
+    stage4.set_next(stage5)
+    app.after(100, stage2.execute)
+    app.mainloop()
+
+main()
diff --git a/docs_store/opcodes/avr_xml_gen/core/data_processing/__init__.py b/docs_store/opcodes/avr_xml_gen/core/data_processing/__init__.py
@@ -0,0 +1,2 @@
+from .ambiguous_data import *
+from .data_management import *
diff --git a/docs_store/opcodes/avr_xml_gen/core/data_processing/ambiguous_data.py b/docs_store/opcodes/avr_xml_gen/core/data_processing/ambiguous_data.py
@@ -0,0 +1,92 @@
+from typing import Optional, Collection, Iterable
+from colorama import Fore, init
+from abc import ABC, abstractmethod
+
+init(autoreset = True)
+
+
+class AmbiguousData(ABC):
+    @abstractmethod
+    def is_ambiguous(self) -> bool:
+        pass
+
+
+class AmbiguousList[T](list[T], AmbiguousData):
+    def __init__(self, single_data: Optional[T] = None):
+        if single_data is None:
+            super().__init__()
+        else:
+            super().__init__([single_data])
+
+    def append(self, single_data: T) -> None:
+        if single_data in self:
+            return
+        else:
+            super().append(single_data)
+
+    def extend(self, iterable: Iterable[T]) -> None:
+        new_data = super().extend(iterable)
+        super().clear
+        if not new_data:
+            return
+        new_data = set(new_data)
+        super().extend(new_data)
+
+    def is_ambiguous(self) -> bool:
+        if not self:
+            return False
+        elif any(isinstance(single_data, AmbiguousData) for single_data in self):
+            return True
+        return len(self) > 1
+
+    def __repr__(self) -> str:
+        if self.is_ambiguous():
+            return Fore.RED + 'Ambiguous:' + Fore.RESET + f'{super().__repr__()}'
+        elif not self:
+            return f'Empty'
+        else:
+            return f'{self[0]}'
+
+
+class AmbiguousDict[S, T](dict[S, T], AmbiguousData):
+    def is_ambiguous(self) -> bool:
+        if not self:
+            return False
+        elif any(isinstance(single_data, AmbiguousData) and single_data.is_ambiguous() for single_data in self.values()):
+            return True
+        else:
+            return False
+
+    def __repr__(self) -> str:
+        if not self:
+            return f'Empty'
+        elif not self.is_ambiguous():
+            return f'{super().__repr__()}'
+        else:
+            return Fore.RED + 'Ambiguous:' + Fore.RESET + f'{super().__repr__()}'
+
+
+class AmbiguousFrozen[T](frozenset[T], AmbiguousData):
+    def __new__(cls, data: Collection[T]):
+        return super().__new__(cls, data)
+
+    def is_ambiguous(self) -> bool:
+        return len(self) > 1
+
+    def __repr__(self) -> str:
+        if self.is_ambiguous():
+            return Fore.RED + 'Ambiguous:' + Fore.RESET + f'{sorted(self)}'
+        else:
+            return f'{next(iter(self), None)}'
+
+    def __or__(self, other) -> 'AmbiguousFrozen[T]':
+        return AmbiguousFrozen(super().__or__(other))
+
+    def __and__(self, other) -> 'AmbiguousFrozen[T]':
+        return AmbiguousFrozen(super().__and__(other))
+
+    def __xor__(self, other) -> 'AmbiguousFrozen[T]':
+        return AmbiguousFrozen(super().__xor__(other))
+
+    def __sub__(self, other) -> 'AmbiguousFrozen[T]':
+        return AmbiguousFrozen(super().__sub__(other))
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		from .ambiguous_data import *
		from .data_management import *