Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if lines is used once, define it as a generator #23

Closed
wants to merge 2 commits into from

Conversation

mgaitan
Copy link

@mgaitan mgaitan commented Aug 31, 2021

from #22

transparently optimise lines to a generator, if possible
Currently lines is always a list. It's impossible to do this right in general, but one way that could work well is to optimise if we detect it's only mentioned once and that mention is either in a comprehension, loop or the second argument to map.

I implemented exactly that.

PS: pyp is great, thanks!

Copy link
Owner

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and glad you find pyp useful.

and that mention is either in a comprehension, loop or the second argument to map.

The second part of that sentence is important, since we can't do list-like things to lines if it's a generator. Most notably the following would no longer work: cat README.md | pyp 'lines[4]' (but people may also want to pass lines to a function that expects a list or assign another variable to the value of lines and do arbitrary things to that variable, etc)

It's a little tricky to do right, especially while keeping the code clean looking. Out of curiosity, is lines being a list something that's actually caused an issue for you?

@mgaitan
Copy link
Author

mgaitan commented Sep 1, 2021

is lines being a list something that's actually caused an issue for you?

not really, although I'm still a very casual user. I've been a beginner with bash my entire life (happily, I guess) but I'm good enough with Python, so this looks pretty appropriate to calm my itches

So, I've just wanted to be helpful on this interesting project, nothing in particular.

cat README.md | pyp 'lines[4]'

I see. Well, I've just defined a wrapper that work as an indexable/sliceable iterator that also caches computed values. It seems to cover those cases

(pyp) tin@pop-os:~/lab/pyp$ cat lala.txt | pyp "[x.upper() for x in lines] + [x.title() for x in lines]"
LINEA 1
LINEA 2
LINEA 3
Linea 1
Linea 2
Linea 3
(pyp) tin@pop-os:~/lab/pyp$ cat lala.txt | pyp "lines[:2]"
linea 1
linea 2
(pyp) tin@pop-os:~/lab/pyp$ cat lala.txt | pyp "lines[2]"
linea 3
(pyp) tin@pop-os:~/lab/pyp$ cat lala.txt 
linea 1
linea 2
linea 3

However, I don't know if it worth the code pollution for little gain on most cases.

@hauntsaninja
Copy link
Owner

Thanks, that's a neat solution!

Pros:

  • When using lines we can stream output without waiting for stdin to finish (e.g. yes | pyp 'lines[1]' now works!)

Cons:

  • Still doesn't quite match list semantics in ways that could be surprising (e.g. next(iter(lines)); next(iter(lines)))
  • Adds another dependency on pyp in the output of --explain. There is precedent for this with pypprint, but that's a more trivial function (and also pyp has the --define-pypprint flag for inlining the dep). And like you say, more code in pyp

Missing feature:

  • Ideally when converting lines to something generator like, we'd also see memory and perf improvements. With a caching iterator, there's often no memory improvement, and benchmarking seq 1 999999 | pyp 'sum(map(int, lines))' indicates this slows pyp down a little (I get 260ms generator, 385ms list, 500ms Indexable).

I think this could cross into "mysterious" behaviour that pyp in general tries to avoid. Ideally, if I add support for user defined magic variables, Indexable would be something that people could just define in their pyp config and that way things would never be mysterious / people could customise the logic further. (I guess even today you could add Indexable into your pyp config, but you'd have to always do pyp 'Indexable(stdin)[1]' instead of say pyp 'ilines[1]').

Anyway, I think for now I'm not going to merge. If I get this more often as a feature request, or if I never get around to adding user defined magic vars, I might reconsider. Thanks again for this PR!

@mgaitan
Copy link
Author

mgaitan commented Sep 6, 2021

@hauntsaninja I agree! let me know if I could help with something less controversial :D

@hauntsaninja hauntsaninja mentioned this pull request Sep 9, 2021
6 tasks
@hauntsaninja
Copy link
Owner

I implemented the magic variable support in configs described in the roadmap issue, see https://github.com/hauntsaninja/pyp#pyp-lets-you-configure-your-own-magic

This means users should now be able to do:

λ export PYP_CONFIG_PATH=config
λ yes | pyp 'ilines[1]'        
y

where config contains the Indexable class you thought of:

λ cat config
import itertools

class Indexable:
    def __init__(self, it):
        self.it = iter(it)
        self._computed = []
        self._exhausted = False

    def __getitem__(self, index):
        try:
            max_idx = index.stop
        except AttributeError:
            max_idx = index
        n = max_idx - len(self._computed) + 1
        if n > 0:
            self._computed.extend(itertools.islice(self.it, n))
        return self._computed[index]

    def __iter__(self):
        if self._exhausted:
            for element in self._computed:
                yield element
        else:
            for element in self.it:
                self._computed.append(element)
                yield element
            self._exhausted = True

ilines = Indexable(z.rstrip('\n') for z in stdin)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants