Feat/image folder support #39

LemurPwned · 2024-10-27T11:00:43Z

Summary by Sourcery

Introduce support for image folder input in the sampler, allowing processing of images directly from directories. Refactor buffer management and validation to improve robustness and flexibility. Enhance the sampler to handle both video and image inputs with the addition of the ImageSampler class. Add comprehensive tests for new and existing functionalities to ensure reliability.

New Features:

Add support for image folder input in the sampler, allowing users to process images directly from a directory.

Enhancements:

Refactor buffer management to include additional utility methods for length, iteration, and item access.
Improve buffer configuration validation by separating it into a dedicated function and adding checks for buffer type and size.
Enhance the sampler to handle both video and image inputs by introducing a new ImageSampler class.

Tests:

Add comprehensive tests for buffer operations, including adding, clearing, and handling duplicate images.
Introduce tests for the new ImageSampler and its configuration, ensuring correct sampling from image folders.
Implement tests for the worker class to verify frame processing and output path formatting.
Add tests for the LLaVA chat integration to ensure correct image description and video summarization.

sourcery-ai · 2024-10-27T11:00:46Z

Reviewer's Guide by Sourcery

This PR introduces support for sampling images from folders by implementing a new ImageSampler class alongside the existing VideoSampler. The implementation includes a significant refactoring of the sampler architecture, introducing a BaseSampler class for shared functionality. Buffer handling has been enhanced with additional methods and improved configuration validation. The changes are accompanied by comprehensive test coverage for the new functionality.

Class diagram for the new ImageSampler and refactored samplers

classDiagram
    class BaseSampler {
        +SamplerConfig cfg
        +FrameBuffer frame_buffer
        +BlurGate | ClipGate | PassGate gate
        +Counter stats
        +sample(str) Iterable~list~FrameObject~~
        +write_queue(str, Queue, str) void
        +init_sampler() void
        +flush_buffer() Iterable~list~FrameObject~~
        +process_frame(int, Image, float) Iterable~list~FrameObject~~
    }
    class ImageSampler {
        +ImageSamplerConfig cfg
        +Regex rgx
        +extract_frame_time(str, str) str
        +sample(str) Iterable~list~FrameObject~~
        +write_queue(str, Queue, str) void
    }
    class VideoSampler {
        +SamplerConfig cfg
        +sample(str, str) Iterable~list~FrameObject~~
        +write_queue(str, Queue, str) void
    }
    class SegmentSampler {
        +SamplerConfig cfg
        +sample(str, str) Iterable~list~FrameObject~~
        +write_queue(str, Queue, str) void
    }
    BaseSampler <|-- ImageSampler
    BaseSampler <|-- VideoSampler
    VideoSampler <|-- SegmentSampler
    class ImageSamplerConfig {
        +str frame_time_regex
    }
    class SamplerConfig {
        +float min_frame_interval_sec
        +bool keyframes_only
        +dict buffer_config
        +dict gate_config
    }
    SamplerConfig <|-- ImageSamplerConfig
    SamplerConfig <|-- BaseSampler
    SamplerConfig <|-- VideoSampler
    SamplerConfig <|-- SegmentSampler

Class diagram for the updated Buffer classes

classDiagram
    class FrameBuffer {
        +add(Image, dict) tuple~Image, dict~~
        +final_flush() Iterable~tuple~Image, dict~~
        +clear() void
        +__len__() int
        +__iter__() Iterator
        +__getitem__(str) tuple~Image, dict~~
        +popitem(bool) tuple~Image, dict~~
    }
    class HashBuffer {
        +add(Image, dict) tuple~Image, dict~~
        +final_flush() Iterable~tuple~Image, dict~~
        +clear() void
        +__len__() int
        +__iter__() Iterator
        +__getitem__(str) tuple~Image, dict~~
        +popitem(bool) tuple~Image, dict~~
    }
    class GridBuffer {
        +add(Image, dict) tuple~Image, dict~~
        +final_flush() Iterable~tuple~Image, dict~~
        +clear() void
        +__len__() int
        +__iter__() Iterator
        +__getitem__(str) tuple~Image, dict~~
        +popitem(bool) tuple~Image, dict~~
    }
    FrameBuffer <|-- HashBuffer
    FrameBuffer <|-- GridBuffer
    note for FrameBuffer "Enhanced with new methods for length, iteration, and item access"

File-Level Changes

Change	Details	Files
Introduced a new sampler architecture with base class and specialized implementations	Created BaseSampler class to share common sampling functionality Implemented ImageSampler for handling image folder sampling Refactored VideoSampler to inherit from BaseSampler Added frame time regex support for image file naming	`video_sampler/samplers/base_sampler.py` `video_sampler/samplers/image_sampler.py` `video_sampler/samplers/video_sampler.py` `video_sampler/samplers/__init__.py`
Enhanced buffer handling with new functionality and improved validation	Added length, iteration, and item access methods to buffer classes Split buffer configuration validation into a dedicated function Fixed buffer size comparison logic Made debug flag optional with default value	`video_sampler/buffer.py`
Added comprehensive test coverage for new functionality	Added tests for buffer operations and configuration Created tests for image sampling functionality Added worker tests for queue handling and output formatting Implemented tests for LLaVA chat integration	`tests/test_buffer.py` `tests/test_samplers.py` `tests/test_worker.py` `tests/test_llava_chat.py`
Refactored worker implementation for better image handling	Separated worker logic into dedicated module Added support for image folder output path handling Improved output path formatting options Enhanced summary collection functionality	`video_sampler/worker.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @LemurPwned - I've reviewed your changes - here's some feedback:

Overall Comments:

Consider standardizing error handling across samplers - the video sampler uses specific av.error exceptions while the image sampler uses broad Exception catching which could mask issues

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟡 Testing: 3 issues found
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

tests/test_buffer.py

tests/test_samplers.py

video_sampler/worker.py

tests/test_buffer.py

tests/test_samplers.py

tests/test_worker.py

tests/test_utils.py

LemurPwned · 2024-10-27T12:54:13Z

@sourcery-ai review

sourcery-ai

Hey @LemurPwned - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟡 Testing: 3 issues found
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

tests/test_samplers.py

tests/test_worker.py

sourcery-ai · 2024-10-27T12:55:50Z

video_sampler/worker.py

+            ft = f"{vbsn}_{ft}"
+        return os.path.join(output_path, f"{ft}.jpg")
+
+    def queue_reader(self, output_path, read_interval=0.1) -> None:


issue (complexity): Consider extracting summary generation logic into a dedicated SummaryManager class to improve code organization.

The queue_reader method mixes frame processing with summary generation logic, making it harder to maintain. Consider extracting the summary generation into a dedicated class:

class SummaryManager: def __init__(self, pool, desc_client, config, debug=False): self.pool = pool self.desc_client = desc_client self.config = config self.debug = debug self.futures = {} self.last_summary_time = -10 def maybe_submit_summary(self, frame, frame_time): if not self.pool: return if frame_time - self.last_summary_time < self.config.get("min_sum_interval", 30): return future = self.pool.submit(self.desc_client.summarise_image, frame) if self.debug: console.print(f"Submitting summary for frame {frame_time}", style="bold yellow") self.futures[frame_time] = future self.last_summary_time = frame_time

This simplifies the queue_reader to:

def queue_reader(self, output_path, read_interval=0.1) -> None: summary_mgr = SummaryManager(self.pool, self.desc_client, self.cfg.summary_config, self.cfg.debug) while True: if not self.q.empty(): for frame_object in self.q.get(): if frame_object.metadata.get("end", False): self.futures = summary_mgr.futures # preserve futures for collection return if frame_object.frame is not None and not self.devnull and isinstance(frame_object.frame, Image.Image): frame_object.frame.save( self.format_output_path(output_path, frame_object.metadata["frame_time"]) ) summary_mgr.maybe_submit_summary(frame_object.frame, frame_object.metadata["frame_time"]) time.sleep(read_interval)

This separation makes the code more maintainable by:

Isolating summary generation logic and state

Simplifying the main loop's flow

Making the timing constraints more explicit

tests/test_buffer.py

tests/test_samplers.py

tests/test_worker.py

tests/test_utils.py

LemurPwned · 2024-10-29T12:28:35Z

@fcakyon
this might work as you expected with the config, if you use that:

python3 -m video_sampler config ./configs/image_base.yaml "./folder-frames/worlds-smallest-cat-bbc" ./sampled-output/ --images

LemurPwned added 3 commits October 27, 2024 01:11

sketch of an image sampler

dab1997

moving video sampler to a separate folder

989ded7

reworking the struct and adding more tests

119af43

sourcery-ai bot reviewed Oct 27, 2024

View reviewed changes

LemurPwned added 3 commits October 27, 2024 13:31

extend some tests

037c553

extend some tests

e2f9bc5

further test coverage

20b6e2b

LemurPwned linked an issue Oct 27, 2024 that may be closed by this pull request

FR: eliminate redundant frames given a list of frames (instead of video) #38

Closed

sourcery-ai bot reviewed Oct 27, 2024

View reviewed changes

LemurPwned added 2 commits October 29, 2024 12:11

add iterators

5c3b552

updates to the image sampling

1149746

LemurPwned added 2 commits November 11, 2024 11:51

test updates

98ac351

readme update

f1e2c36

LemurPwned merged commit f7b1d8c into main Nov 11, 2024
3 checks passed

LemurPwned deleted the feat/image-folder-support branch November 11, 2024 11:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/image folder support #39

Feat/image folder support #39

LemurPwned commented Oct 27, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 27, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

LemurPwned commented Oct 27, 2024

sourcery-ai bot left a comment

sourcery-ai bot Oct 27, 2024

LemurPwned commented Oct 29, 2024

Feat/image folder support #39

Feat/image folder support #39

Conversation

LemurPwned commented Oct 27, 2024 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Oct 27, 2024 • edited Loading

Reviewer's Guide by Sourcery

Class diagram for the new ImageSampler and refactored samplers

Class diagram for the updated Buffer classes

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

LemurPwned commented Oct 27, 2024

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Oct 27, 2024

Choose a reason for hiding this comment

LemurPwned commented Oct 29, 2024

LemurPwned commented Oct 27, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 27, 2024 •

edited

Loading