Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: function to crawl replies (comments) to each post of channel? #49

Closed
vicru opened this issue Apr 28, 2023 · 5 comments
Closed

feat: function to crawl replies (comments) to each post of channel? #49

vicru opened this issue Apr 28, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@vicru
Copy link

vicru commented Apr 28, 2023

Hi, is there a function built in tegracli to crawl not only the text of posts of a channel (tegracli does a great job in this respect) but actually to crawl all replies (comments) to each post of a telegram-channel?

tegracli yields the number of replies to a post (in the variable called replies). I have not found a way to actually crawl the text of those individual replies.

Thanks in advance for any insights.

@pekasen
Copy link
Member

pekasen commented May 3, 2023

Hi,

tegracli does not (yet) have this functionality in a way which would make a larger collection of replies feasible: The get-command exposes the replies-parameter which can be used to retrieved replies to a single post. You could use an external loop over the post_ids you need and pass the ids in the replies-parameter.

However, I'd imagine another command to get the replies for a list of posts (comparable to the hydrate-command). Although, I am not sure about the timeline.

@pekasen pekasen added the enhancement New feature or request label May 3, 2023
@pekasen pekasen changed the title function to crawl replies (comments) to each post of channel? feat: function to crawl replies (comments) to each post of channel? May 3, 2023
@pekasen
Copy link
Member

pekasen commented May 4, 2023

Hi @vicru,

upon further investigation I found that Telegram indeed gives the count of replies in the field you mentioned, e.g.:

{
  "id": 123,
  "replies": {
    "_": "MessageReplies",
    "replies": 3, 
    ...
  },
...
}

The above mentioned API-method can give you the replies, however, if you already crawled the channel history you should have the replies on your system!

Replies are marked by Telegram in the following way: replying to a post creates another post for the channel, thus, increasing the post count. Thus, that new posts very much looks like a regular posts. However, it has a field called reply_to in which the posts which was replied to is referenced via it's post number.

{
  "id": 124,  # new message
  "reply_to": {
    "_": "MessageReplyHeader",
    "reply_to_msg_id": 123,  # referenced post id
    "reply_to_scheduled": false,
    "reply_to_peer_id": null,
    "reply_to_top_id": null
  },
  ...
}

I think the already existing functionality of tegracli is so far sufficient, as we get the data we want by getting the channel history. Happy to hear your thoughts.

@FlxVctr
Copy link
Member

FlxVctr commented May 4, 2023

Can this be closed then?

@vicru
Copy link
Author

vicru commented May 5, 2023

Hi @vicru,

upon further investigation I found that Telegram indeed gives the count of replies in the field you mentioned, e.g.:

{
  "id": 123,
  "replies": {
    "_": "MessageReplies",
    "replies": 3, 
    ...
  },
...
}

The above mentioned API-method can give you the replies, however, if you already crawled the channel history you should have the replies on your system!

Replies are marked by Telegram in the following way: replying to a post creates another post for the channel, thus, increasing the post count. Thus, that new posts very much looks like a regular posts. However, it has a field called reply_to in which the posts which was replied to is referenced via it's post number.

{
  "id": 124,  # new message
  "reply_to": {
    "_": "MessageReplyHeader",
    "reply_to_msg_id": 123,  # referenced post id
    "reply_to_scheduled": false,
    "reply_to_peer_id": null,
    "reply_to_top_id": null
  },
  ...
}

I think the already existing functionality of tegracli is so far sufficient, as we get the data we want by getting the channel history. Happy to hear your thoughts.

Thanks a lot for the quick reply!

Yup, tegracli does crawl the number of replies. I had a quick look into crawled observations whose reply_to value was not a None, expecting to see on their corresponding message variable/column a comment to a channel post. However, those corresponding message rows do not contain comments/reply to a channel post, but channel posts. Am I looking in the wrong column or tegracli doesn't crawl comments to channel-posts for the moment?

@pekasen
Copy link
Member

pekasen commented May 9, 2023

Thanks a lot for the quick reply!

Yup, tegracli does crawl the number of replies. I had a quick look into crawled observations whose reply_to value was not a None, expecting to see on their corresponding message variable/column a comment to a channel post. However, those corresponding message rows do not contain comments/reply to a channel post, but channel posts. Am I looking in the wrong column or tegracli doesn't crawl comments to channel-posts for the moment?

You're welcome. As I was stating above the replies are not delivered as markup for a specific channel post (as I understand you are expecting them to be) but rather as references between channel posts. Hence, in the above example channel post 124 is a reply to post 123. If you inspect the data you have, you might see that each of these replies may originate from different accounts. References to the posting accounts are found in the from_author field.

Btw, all of this is not related to how tegracli captures the data from the Telegram API but rather the Telegram API's data structure.

@pekasen pekasen closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants