Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New frontier for PyToCS: .NET data science and machine learning #75

Open
GeorgeS2019 opened this issue Nov 9, 2021 · 10 comments
Open

Comments

@GeorgeS2019
Copy link

GeorgeS2019 commented Nov 9, 2021

@uxmal You have been doing this for close to 6 years. Now we need to challenge you for something you would not have conceived 6 years ago.

.NET is meeting python HALF WAY!

Instead of the usual Python to C#, Imagine that this task is made Simpler and quick to verify by successful compiling.

Recently, the Microsoft team decided to take a drastic decision to make .NET csharp/F# code to be as close as possible to python in the context of PyTorch to TorchSharp as shown in the attached image below.

Questions

When the codes of Python and .NET look almost similar, WHAT ADJUSTMENT and MODEIFICATIONS to PyToCs needed to make this conversion with high probability of success with minimum post-conversion manual editing?

Can you use the Tests you have created to share your suggestions?

The real world end to end use case is discussed here.

Imagine, the .NET interactive integrates both PyToCs and Roslyn, so when a python Jupyter notebook is opened within the .NET interactive, the PyTorch codes sections are extracted, converted to e.g. Csharp using PyToCs, verified the conversion by compiling internally using Roslyn. The failure of compiling will report which segments of the python codes fail to compile and still incompatible with TorchSharp. This report is critical to accelerate TorchSharp binding code coverage using real world scenario.

I hope it is clear. I hope this is an exciting exercise for the tool you have conceived 6 years ago and the .NET deep learning community need your contribution to extend your tool to a very interesting use case.

image

@uxmal
Copy link
Owner

uxmal commented Nov 9, 2021

Hello, and thanks for your interest in pytocs! It's not quite clear to me what you're asking for but let me attempt to answer the questions you're asking.

When the codes of Python and .NET look almost similar, WHAT ADJUSTMENT and MODEIFICATIONS to PyToCs needed to make this conversion with high probability of success with minimum post-conversion manual editing?

The Python code fragment in the screen shot could almost be handled by pytocs in the state it is in now. The main stumbling blocks are:

  • type inference. This is still not working as well as I'd like, but "nicely" written Python code can often result in partially OK results. A possible approach is to change pytocs to handle Python type annotations; right now pytocs parses them, but doesn't do anything with them.
  • semantic differences in the Python and C# languages. The dynamic nature of Python is sometimes hard to replicate in C# automatically.
  • various bugs. These need to be identified and chased down. This leads me to your next question:

Can you use the Tests you have created to share your suggestions?

I'm not sure what you're asking here, but you are more than welcome to contribute with pull requests of Python code fragments and their expected translation to C#. You can look at the examples in: https://github.com/uxmal/pytocs/blob/master/src/Pytocs.Tests/ParserAcceptanceTests.cs
I can then see what needs to be improved to make fully automatic translation work.

@GeorgeS2019
Copy link
Author

GeorgeS2019 commented Nov 9, 2021

@uxmal

As more users join to port PyTorch codes to the corresponding TorchSharp, we will have more converted TorchSharp codes to "train" the conversion of PyTorch using PyToCs.

Given your 6 years of experience learn from sharing this project, by just looking at the example provided, could you commend/suggest how best to make the PyToCs conversion "practical"?

Shall the community

  • Share FAQ on how to simplify the conversion?
  • Is there user defined Rules in PyToCs that users can customize to make the PyToCs codes more compatible with TorchSharp look?
  • Likewise, based on what PyToCs can achieve, how would you recommend TorshSharp developer to meet PytoCs output. I mean is there a need for TochSharp to be more flexible in accepting PyToCs generated csharp codes.

Questions

John, I hope you find these questions interesting. This scenario is not restricted to TorchSharp, there are many .NET community projects that are based on python codes. Java to Csharp is less challenging than python to Csharp.

Java to Csharp is more supported than python over last decades.

PERHAPS, now there are more .NET projects attempting to look "python" like, due to huge interest in data science and machine learning, do you see there is NEED to RETHINK PytoCs design? How would you do that if you were to start, where would you do it differently, MORE IMPORTANTLY, how would you recommend these .NET communities.

Python Source: TEXT CLASSIFICATION WITH THE TORCHTEXT LIBRARY

from torch import nn

class TextClassificationModel(nn.Module):

    def __init__(self, vocab_size, embed_dim, num_class):
        super(TextClassificationModel, self).__init__()
        self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=True)
        self.fc = nn.Linear(embed_dim, num_class)
        self.init_weights()

    def init_weights(self):
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)
        self.fc.bias.data.zero_()

    def forward(self, text, offsets):
        embedded = self.embedding(text, offsets)
        return self.fc(embedded)

PyToCs conversion

using nn = torch.nn;

public static class PyTorch {
    
    public class TextClassificationModel
        : nn.Module {
        
        public object embedding;
        
        public object fc;
        
        public TextClassificationModel(object vocab_size, object embed_dim, object num_class) {
            this.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse: true);
            this.fc = nn.Linear(embed_dim, num_class);
            this.init_weights();
        }
        
        public virtual object init_weights() {
            var initrange = 0.5;
            this.embedding.weight.data.uniform_(-initrange, initrange);
            this.fc.weight.data.uniform_(-initrange, initrange);
            this.fc.bias.data.zero_();
        }
        
        public virtual object forward(object text, object offsets) {
            var embedded = this.embedding(text, offsets);
            return this.fc(embedded);
        }
    }
}

Manual conversion

using static TorchSharp.torch;
using static TorchSharp.torch.nn;
using static TorchSharp.torch.nn.functional;

 class TextClassificationModel : Module
 {
     private Modules.EmbeddingBag embedding;
     private Modules.Linear fc;

     public TextClassificationModel(long vocab_size, long embed_dim, long num_class) : base("TextClassification")
     {
         embedding = EmbeddingBag(vocab_size, embed_dim, sparse: false);
         fc = Linear(embed_dim, num_class);
         InitWeights();

         RegisterComponents();
     }

     private void InitWeights()
     {
         var initrange = 0.5;

         init.uniform_(embedding.Weight, -initrange, initrange);
         init.uniform_(fc.Weight, -initrange, initrange);
         init.zeros_(fc.Bias);
     }

     public override Tensor forward(Tensor t)
     {
         throw new NotImplementedException();
     }

     public override Tensor forward(Tensor input, Tensor offsets)
     {
         using var t = embedding.forward(input, offsets);
         return fc.forward(t);
     }

     public new TextClassificationModel to(Device device)
     {
         base.to(device);
         return this;
     }
 }

@uxmal
Copy link
Owner

uxmal commented Nov 12, 2021

I think the design of pytocs as it stands now is fine. It's a transpiler that converts Python source code to C# source code, trying to bridge the syntactic and semantic gap between the two languages. The biggest area for improvement is type inference support. It would be fantastic if pytocs could do a better job of inferring -- or using type hints -- to provide more accurate initial results. That's a question of people providing (small) samples of source code where they think pytocs could do a better job of inferring types, and fixing those. Naturally, contributions are welcome.

I think providing a 100% automatic translation of idiomatic Python source code is not possible. There are constructs in Python that just cannot be translated easily/automatically to C#, but require human intervention. I've already outlined in the pytocs documentation (https://github.com/uxmal/pytocs/blob/master/doc/HOWTO.md) a suitable git workflow that can track an active Python project and generate C#. I use this workflow in my personal projects and it works just fine.

@GeorgeS2019
Copy link
Author

John, thanks again for taking time off to share your insight, which is valuable and not easy to gain by just looking through the codes.

Currently we are doing one-week long ML.NET hackathon. I will share your valuable insight to other participants when they attempt to port python code to .NET for e.g. TorchSharp or Tensorflow.NET. Thank you.

@GeorgeS2019
Copy link
Author

@uxmal a quick update. The decision to use pytorch-like syntax in TorchSharp has led to more community adoption. The TorchSharp community has grown significantly and the degree of PyTorch coverage is increasingly at steady speed.

@toolgood
Copy link
Contributor

toolgood commented Feb 11, 2023

There are different design concepts between pytorch and TorchSharp,

python code

self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False)

C# code

this.conv1 = nn.Conv1d(inputChannel: d_model, outputChannel: d_ff, kernelSize: 1, bias: false);

Parameter names are different .

Some methods exist in pytorch, but they do not exist in the document. TorchSharp does not support such methods.

look this dotnet/TorchSharp#901

@GeorgeS2019
Copy link
Author

@toolgood if you look into PyToCS, the parameter names could be replaced from the PyTorch version to the TorchSharp version. This will speed up beginner adopting to TorchSharp coming from pyTorch

@toolgood
Copy link
Contributor

@uxmal I have written part of the code to convert to TorchSharp, using text replacement and regular replacement.

@GeorgeS2019
Copy link
Author

@uxmal Could u evaluate and then merge the PR submitted by @toolgood :-)

@GeorgeS2019
Copy link
Author

From @uxmal Nov 2021: You can look at the examples in: https://github.com/uxmal/pytocs/blob/master/src/Pytocs.Tests/ParserAcceptanceTests.cs
I can then see what needs to be improved to make fully automatic translation work.

@toolgood I have not look into your PR yet, just curious if you took @uxmal into consideration. Perhaps @uxmal has additional suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants