Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduced a format for displaying the DCSR_matrix #1176

Closed
wants to merge 9 commits into from

Conversation

Sai-Suraj-27
Copy link
Collaborator

@Sai-Suraj-27 Sai-Suraj-27 commented Jul 5, 2023

Description

Added a __str__() method for printing the DCSR_matrix.
PyTorch displays the sparse tensors in this format:
image
The introduced format displays the following:
image

Issue/s resolved: #1040

Changes proposed:

  • Added a method for printing the DCSR_matrix.

Type of change

  • New feature (non-breaking change which adds functionality)

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Title of PR is suitable for corresponding CHANGELOG entry

Does this change modify the behaviour of other functions? If so, which?

No

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

@mrfh92
Copy link
Collaborator

mrfh92 commented Jul 10, 2023

@Sai-Suraj-27 Regarding the introduced format I have a question: your picture of the PyTorch print result shows that indices are given by a 2D-tensor (which makes sense because specifying the position of a non-zero entry in a matrix requires us to state row- and column-index). In your print result for DCSR, the index-tensor is 1D. Is this due to the CSR-format?

@Sai-Suraj-27
Copy link
Collaborator Author

Sai-Suraj-27 commented Jul 10, 2023

@Sai-Suraj-27 Regarding the introduced format I have a question: your picture of the PyTorch print result shows that indices are given by a 2D-tensor (which makes sense because specifying the position of a non-zero entry in a matrix requires us to state row- and column-index). In your print result for DCSR, the index-tensor is 1D. Is this due to the CSR-format?

@mrfh92 Yes, sir. In PyTorch's sparse tensor representation, the indices are stored as a 2D tensor. On the other hand, In CSR format, the indices are stored as a 1D tensor, where each element represents the column index of a non-zero entry.
We can also add and display the indptr array which will tell the number of non-zero entries in each row and from that and the indices tensor one can easily find out the (row, col) of all non-zero elements. Please tell me your opinion and if you want me to change anything else in this format of displaying the DCSR_matrix.

@mrfh92
Copy link
Collaborator

mrfh92 commented Jul 10, 2023

Hi @Sai-Suraj-27,
thanks for your fast anwer :) I would like to suggest to include the indptr as well such that the output of print gives the user full information needed to "reconstruct" the tensor. However, as I remember @ClaudiaComito has mentored the implementation of the sparse-module so far; so maybe its a better idea to wait until next week when she is back since she can give a more detailed feedback.

@Sai-Suraj-27
Copy link
Collaborator Author

Hi @Sai-Suraj-27, thanks for your fast anwer :) I would like to suggest to include the indptr as well such that the output of print gives the user full information needed to "reconstruct" the tensor. However, as I remember @ClaudiaComito has mentored the implementation of the sparse-module so far; so maybe its a better idea to wait until next week when she is back since she can give a more detailed feedback.

Ok, sir. I will wait for the feedback.

@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

@mrfh92
Copy link
Collaborator

mrfh92 commented Jul 24, 2023

Hi @Sai-Suraj-27,
I think it would be the best option to leave indices and data completely away when displaying a sparse DNDarray. So is suffices to display that it is an object of class DCSR_matrix, its size, split and the number of non-zero entires nzz. The reason is that usually sparse arrays are used in a context where the data is so large that displaying single entries does not make sense anymore.

@Sai-Suraj-27
Copy link
Collaborator Author

Hi @Sai-Suraj-27, I think it would be the best option to leave indices and data completely away when displaying a sparse DNDarray. So is suffices to display that it is an object of class DCSR_matrix, its size, split and the number of non-zero entires nzz. The reason is that usually sparse arrays are used in a context where the data is so large that displaying single entries does not make sense anymore.

@mrfh92 sir, so do you want me to update the function such that we will be printing like this:
DCSR_matrix(size=(3, 3), nnz=8, split=None) (without indptr, data, and indices)

@github-actions
Copy link
Contributor

Thank you for the PR!

@@ -337,3 +337,22 @@ def __repr__(self) -> str:
if self.comm.rank != 0:
return ""
return print_string

def __str__(self) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if this function is moved to the core/printing module. There are cases like local print vs global print to be handled. See the __str__ method of the DNDarray class for more information.

And it is usually best to use the printing format used by PyTorch since it gives more information to the user. The information outputted by __str__ should prioritize user-readability more than anything else. Just the indptr and indices would be too difficult to understand. I, personally, would prefer to see the exact coordinates being printed out. I say this because the users of Heat shouldn't be expected to know about the CSR matrix and its format. They should just be able to use this more-efficient data structure without having to learn too much about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @mrfh92 says also seems acceptable to me. Maybe the users won't want to see the actual data points. But my only problem with this solution is, the users wouldn't be able to see the data points even if they wanted to. There is no other way for them to get the coordinates if we don't show it in the print output.

Copy link
Contributor

github-actions bot commented Dec 4, 2023

This pull request is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stale label Dec 4, 2023
Copy link
Contributor

This pull request was closed because it has been inactive for 60 days since being marked as stale.

@github-actions github-actions bot closed this Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature-384 Sparse] Printing of DCSR_matrix
3 participants