-
-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e925830
commit 32912c6
Showing
31 changed files
with
2,787 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
intermediate_data/ | ||
*.log | ||
__pycache__ |
Binary file added
BIN
+1.14 MB
docs_store/opcodes/avr_xml_gen/AVR-Instruction-Set-Manual-DS40002198A.pdf
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
This is a program for generating avr instructions xml | ||
|
||
The rogram uses the avr datasheet, which you can find under [AVR-Instruction-Set-Manual-DS40002198A.pdf](https://ww1.microchip.com/downloads/en/DeviceDoc/AVR-InstructionSet-Manual-DS40002198.pdf) | ||
|
||
The whole programm is split into 4 stages: | ||
|
||
1. Data markers | ||
2. Data extraction | ||
3. Data analysis | ||
4. Xml generation | ||
|
||
For gui, [tkinter](https://docs.python.org/3/library/tkinter.html) is used. | ||
|
||
# 1. Data managinig and structuring | ||
|
||
Stored in `core/data_processing`. | ||
There are series of unfortunate titles such as `data_processing` or `data_processing.data_management`, | ||
because they don't point to the exact purpose of classes. This can be changed in | ||
the future, but so far, fate has decided. | ||
|
||
## 1.1 Storing instruction and its forms | ||
|
||
Each instruction form has several aspects. They are: | ||
|
||
1. Mnemonic ('name' in xml) | ||
2. Version (version of avr architecture) | ||
3. Description ('summary' in xml) | ||
4. Operands | ||
5. SREG (status registers) | ||
6. Opcode | ||
|
||
I extracted aspects 1, 3, and 4 from tables and 1, 2, 5, and 6 from chapters (datasheet, | ||
chapter 5). This is merged, so now 3 and 4 is stored as `table data`', while 5 and | ||
6 are stored as `chapter data`. Sometimes you can see subchapter instead of chapter | ||
because at the very beginning I wanted to name it 'subchapter'. In some cases, division | ||
into the chapter and subchapter makes sence (in the data extraction stage). Even | ||
though 2 (version) is extracted from a chapter, the version is chapter-data independent, | ||
and it was just convenient to extract versions from chapter. | ||
|
||
There is the class `data_management.InstructionForm` for storing `mnemonic`, `version`, | ||
`table_data` and `chapter_data`. It inherits `data_management.StrictDictionary` | ||
which is the same as default dictionary, but with fixed keys. This is helpful, | ||
as python points tells if there are wrong or missing keys. | ||
|
||
Instruction is stored as a dictionary with `str -> list[InstructionForm]`, where | ||
`str` is the instruction name. | ||
|
||
## 1.2 Context | ||
|
||
`data_management.Context` stores all data that could be used by stages. It stores | ||
instruction data markers, instruction data extracted from datasheet, parsed instruction | ||
data (instruction forms), and even the datasheet file name. | ||
|
||
Context has dictionary and works in the key-value concept, but with one feature: | ||
keys are data type. So it could store only one copy of particular data. This was | ||
better than a dictionary with str names or something else. During development, it | ||
became clear that this needed to be changed for Aliases. | ||
|
||
Some unique data (e.g `ExtractedInstructionData`, `ProcessedInstructionData`) could | ||
just be an `Alias`, but they would have been the same key of type `Alias` if I have | ||
left as it was. A future improvement could be to make context distinguish different | ||
aliases as keys. | ||
|
||
## 1.3 Ambiguous data | ||
|
||
In `ambiguous_data.py` you can see several default data types analogs (`dict`, | ||
`list`, `forzenset`) but with the 'Ambiguous' prefix. | ||
|
||
During the data extraction it became clear that it is useful to determine if a list | ||
or set has more than one object. It was too often so I decided to create the same | ||
classes but with a `is_ambiguous()` method and special output: it prints 'Ambiguous' | ||
if data is ambiguous. The dictionary is ambiguous if it has ambiguous value or key. | ||
The whole extracted instruction data is `AmbiguousDict[AmbiguousFrozen, AmbiguousList]`. | ||
|
||
## 1.4 Other random classes | ||
|
||
Observer and Subject: | ||
I used the observer pattern for stages, so I just shoved these classes here :) | ||
|
||
DataManager: | ||
Is used by stage (2). It has `request()` method. The implementation depends on a | ||
particular class. The biggest example of this is in `core/stages/data_analysis/instruction_forms_manager.py`. | ||
|
||
# 2. Stages | ||
|
||
All related to stages is stored in `core/stages`. | ||
|
||
## 2.1 The main tasks of the stages | ||
|
||
1. The data markers stage customizes the instruciton data markers that determine | ||
what is useful data in the datasheet. It isn't used at this time because data extraction | ||
stage isn't completed. Details are written in '2 Data extraction'. If you want to | ||
see what is created for data markers, replace `app.after(100, stage2.execute)` with | ||
`app.after(100, stage1.execute)`. This is a demonstration of a field selector. It | ||
can catch pdf objects like images, tables and words and access to all of their attributes. | ||
|
||
2. The data extraction stage uses data markers created in first stage to extract | ||
instruction data from the datasheet. This stage doesn't have a modular structure | ||
and doesn't use markers, so it should be remade. | ||
|
||
3. Data analysis converts extracted data for direct xml compilation. This data is | ||
a dictionary with instruction names as keys and lists of instruction forms as values. | ||
Datasheet data isn't always ambiguous, which can be resolved by two ways: manually | ||
by selecting needed options or programmatically by adding code to handle specific | ||
cases. The first option is preferred. | ||
|
||
4. Xml generation speaks for itself. It uses processed data from the data analysis | ||
stage. | ||
|
||
## 2.2 Stage concept itself | ||
|
||
Basic stage classes stored in `stage.py` | ||
|
||
Summary: | ||
|
||
The whole programm is split into stages. Each stage has next stage (it could be None), | ||
`execute()`, and `try_complete()` methods. It require `Context` (1.2) for initialization. | ||
|
||
At the very beginning we `execute()` the first stage. By default, the stage tries | ||
to be completed by `try_complete()` after the `<<Complete>>` event. If it is completed, | ||
the stage executes the next stage. Otherwise, its work is continued. | ||
|
||
`Stage` also has a `permanently_completed` variant. This is needed for `BidirectionStage`, | ||
which could go forwards and back. If `Stage` is permanently completed, it can't be | ||
`execute()`d. | ||
|
||
More details: | ||
In addition to the `Stage` class, there is also `StageTask`, `StageGUI` and `BidirectionalStage`. | ||
`StageTask` is used by `try_complete()` to check if completion conditions are satisfied | ||
by method `is_completed()`. It also requires `Context`. | ||
|
||
`StageGUI` is used if we need gui :). It has `enable()` and `disable()` methods. | ||
It requires `DataManager`. (1.4) `Stage.execute()` calls `StageGUIStage` GUI generates | ||
a `<<Complete>>` event. After generating this event, `Stage` executes `try_complete()`. | ||
|
||
In fact, `StageTask` and `StageGUI` are optional. See `xml_generation.py`. | ||
|
||
`BidirectionalStage` is the same as default stage, except that it also has the previous_stage. | ||
It inherits `Stage`. | ||
|
||
## 2.3 Stages implementation | ||
|
||
1. Data markers has one stage `SettingHeader`. It determines the height of the pdf | ||
header. This is just a demonstration. For gui,`PDFRegionSelector`is used. | ||
|
||
2. Data extraction has one big process of data extraction, no more. As I mentioned | ||
in 2.1, it could be cleaned up. It has gui with a 'waiting' caption. | ||
|
||
3. Data analysis has `InstructionBuilding` class. It is `BidirectionalStage`. An | ||
`InstructionBuilding` instance is created for each instruction. It automatically | ||
generates one singular instruction form and completes itself, if instruction data | ||
isn't ambiguous (not `is_ambiguous()`, see `AmbiguousData` in 1.3), otherwise its | ||
GUI is opened. In the GUI, the user creates forms and selects correct form aspects. | ||
A future imrprovement could be adding saving for selected options. In `app.py` you | ||
can see `InstructionBuildingInitializing`. It is needed for creating `InstructionBuilding` | ||
instances. | ||
|
||
4. Xml generation has `XMLGeneration` which uses `InstructionXMLBuilder`. It is a | ||
realy simple stage even without GUI and Task (task should be added to check if all | ||
is good) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
from core.logging import setup_logging | ||
setup_logging() | ||
|
||
from core.data_processing import Context, InstructionDataMarkers, SourceInfo, ExtractedInstructionsData, ProcessedInstructionsData | ||
from core.stages.data_markers import SettingHeader | ||
from core.stages.data_extraction import DataExtraction | ||
from core.stages.data_analysis import InstructionBuilding | ||
from core.stages.xml_generation import XMLGeneration | ||
from core.stages import Stage, BidirectionalStage, StageGUI, StageTask | ||
from core.data_processing.data_management import DataManager | ||
import tkinter as tk | ||
import os | ||
|
||
|
||
class InstructionBuildingInitializingTask(StageTask): | ||
def is_completed(self) -> bool: | ||
return True | ||
|
||
|
||
class InstructionBuildingInitializingGUI(StageGUI): | ||
def __init__(self, master=None, cnf=..., **kwargs) -> None: | ||
super().__init__(None, master, cnf, **kwargs) | ||
self.__setup_ui() | ||
self.__setup_bindings() | ||
|
||
def __setup_ui(self): | ||
self._label = tk.Label(self, text='Now we need to make up the forms\n' \ | ||
'Some instructions data is ambiguous\n' \ | ||
'In that case you need to make instruction froms manually' | ||
) | ||
self._button = tk.Button(self, text='Resolve ambiguoity') | ||
self._label.pack(side=tk.TOP, fill=tk.BOTH, expand=True) | ||
self._button.pack(side=tk.BOTTOM, fill=tk.X, expand=True) | ||
|
||
def __setup_bindings(self): | ||
self._button.config(command=lambda: self.event_generate('<<Complete>>')) | ||
|
||
|
||
class InstructionBuildingInitializing(Stage): | ||
def __init__(self, stages: set[Stage], context: Context, master=None, cnf=..., **kwargs) -> None: | ||
super().__init__(context, master, cnf, **kwargs) | ||
self.set_gui(InstructionBuildingInitializingGUI()) | ||
self.set_task(InstructionBuildingInitializingTask(self._context)) | ||
self._stages = stages | ||
|
||
def execute(self) -> None: | ||
extracted_data = self._context[ExtractedInstructionsData] | ||
stages: list[BidirectionalStage] = [] | ||
for instruction_name in extracted_data.keys(): | ||
stages.append(InstructionBuilding(instruction_name, self._context)) | ||
for i in range(1, len(stages)): | ||
stages[i-1].set_next(stages[i]) | ||
stages[i].set_previous(stages[i-1]) | ||
stages[-1].set_next(self._next_stage) | ||
self._next_stage = stages[0] | ||
self._stages.update(stages) | ||
if self._gui is not None: | ||
self._gui.enable(expand=True) | ||
|
||
|
||
class EndStageGUI(StageGUI): | ||
def __init__(self, data_manager: DataManager | None = None, master=None, cnf=..., **kwargs) -> None: | ||
super().__init__(data_manager, master, cnf, **kwargs) | ||
self.label = tk.Label(self, text='Work is completed') | ||
self.label.pack() | ||
|
||
|
||
class EndStage(Stage): | ||
def __init__(self, context: Context, master=None, cnf=..., **kwargs) -> None: | ||
super().__init__(context, master, cnf, **kwargs) | ||
self._gui = EndStageGUI() | ||
|
||
|
||
def main(): | ||
app = tk.Tk() | ||
|
||
context = Context() | ||
context.record(ExtractedInstructionsData()) | ||
context.record(ProcessedInstructionsData()) | ||
source_info = SourceInfo() | ||
source_info['pdf_path'] = os.path.abspath('./AVR-Instruction-Set-Manual-DS40002198A.pdf') | ||
context.record(source_info) | ||
context.record(InstructionDataMarkers()) | ||
|
||
stages: set[Stage] = set() | ||
stage1 = SettingHeader(context) | ||
stage2 = DataExtraction(context) | ||
stage3 = InstructionBuildingInitializing(stages, context) | ||
stage4 = XMLGeneration(context) | ||
stage5 = EndStage(context) | ||
|
||
stage1.set_next(stage2) | ||
stage2.set_next(stage3) | ||
stage3.set_next(stage4) | ||
stage4.set_next(stage5) | ||
app.after(100, stage2.execute) | ||
app.mainloop() | ||
|
||
main() |
2 changes: 2 additions & 0 deletions
2
docs_store/opcodes/avr_xml_gen/core/data_processing/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
from .ambiguous_data import * | ||
from .data_management import * |
92 changes: 92 additions & 0 deletions
92
docs_store/opcodes/avr_xml_gen/core/data_processing/ambiguous_data.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
from typing import Optional, Collection, Iterable | ||
from colorama import Fore, init | ||
from abc import ABC, abstractmethod | ||
|
||
init(autoreset = True) | ||
|
||
|
||
class AmbiguousData(ABC): | ||
@abstractmethod | ||
def is_ambiguous(self) -> bool: | ||
pass | ||
|
||
|
||
class AmbiguousList[T](list[T], AmbiguousData): | ||
def __init__(self, single_data: Optional[T] = None): | ||
if single_data is None: | ||
super().__init__() | ||
else: | ||
super().__init__([single_data]) | ||
|
||
def append(self, single_data: T) -> None: | ||
if single_data in self: | ||
return | ||
else: | ||
super().append(single_data) | ||
|
||
def extend(self, iterable: Iterable[T]) -> None: | ||
new_data = super().extend(iterable) | ||
super().clear | ||
if not new_data: | ||
return | ||
new_data = set(new_data) | ||
super().extend(new_data) | ||
|
||
def is_ambiguous(self) -> bool: | ||
if not self: | ||
return False | ||
elif any(isinstance(single_data, AmbiguousData) for single_data in self): | ||
return True | ||
return len(self) > 1 | ||
|
||
def __repr__(self) -> str: | ||
if self.is_ambiguous(): | ||
return Fore.RED + 'Ambiguous:' + Fore.RESET + f'{super().__repr__()}' | ||
elif not self: | ||
return f'Empty' | ||
else: | ||
return f'{self[0]}' | ||
|
||
|
||
class AmbiguousDict[S, T](dict[S, T], AmbiguousData): | ||
def is_ambiguous(self) -> bool: | ||
if not self: | ||
return False | ||
elif any(isinstance(single_data, AmbiguousData) and single_data.is_ambiguous() for single_data in self.values()): | ||
return True | ||
else: | ||
return False | ||
|
||
def __repr__(self) -> str: | ||
if not self: | ||
return f'Empty' | ||
elif not self.is_ambiguous(): | ||
return f'{super().__repr__()}' | ||
else: | ||
return Fore.RED + 'Ambiguous:' + Fore.RESET + f'{super().__repr__()}' | ||
|
||
|
||
class AmbiguousFrozen[T](frozenset[T], AmbiguousData): | ||
def __new__(cls, data: Collection[T]): | ||
return super().__new__(cls, data) | ||
|
||
def is_ambiguous(self) -> bool: | ||
return len(self) > 1 | ||
|
||
def __repr__(self) -> str: | ||
if self.is_ambiguous(): | ||
return Fore.RED + 'Ambiguous:' + Fore.RESET + f'{sorted(self)}' | ||
else: | ||
return f'{next(iter(self), None)}' | ||
|
||
def __or__(self, other) -> 'AmbiguousFrozen[T]': | ||
return AmbiguousFrozen(super().__or__(other)) | ||
|
||
def __and__(self, other) -> 'AmbiguousFrozen[T]': | ||
return AmbiguousFrozen(super().__and__(other)) | ||
|
||
def __xor__(self, other) -> 'AmbiguousFrozen[T]': | ||
return AmbiguousFrozen(super().__xor__(other)) | ||
|
||
def __sub__(self, other) -> 'AmbiguousFrozen[T]': | ||
return AmbiguousFrozen(super().__sub__(other)) |
Oops, something went wrong.