-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New frontier for PyToCS: .NET data science and machine learning #75
Comments
Hello, and thanks for your interest in pytocs! It's not quite clear to me what you're asking for but let me attempt to answer the questions you're asking.
The Python code fragment in the screen shot could almost be handled by pytocs in the state it is in now. The main stumbling blocks are:
I'm not sure what you're asking here, but you are more than welcome to contribute with pull requests of Python code fragments and their expected translation to C#. You can look at the examples in: https://github.com/uxmal/pytocs/blob/master/src/Pytocs.Tests/ParserAcceptanceTests.cs |
As more users join to port PyTorch codes to the corresponding TorchSharp, we will have more converted TorchSharp codes to "train" the conversion of PyTorch using PyToCs. Given your 6 years of experience learn from sharing this project, by just looking at the example provided, could you commend/suggest how best to make the PyToCs conversion "practical"? Shall the community
QuestionsJohn, I hope you find these questions interesting. This scenario is not restricted to TorchSharp, there are many .NET community projects that are based on python codes. Java to Csharp is less challenging than python to Csharp. Java to Csharp is more supported than python over last decades. PERHAPS, now there are more .NET projects attempting to look "python" like, due to huge interest in data science and machine learning, do you see there is NEED to RETHINK PytoCs design? How would you do that if you were to start, where would you do it differently, MORE IMPORTANTLY, how would you recommend these .NET communities. Python Source: TEXT CLASSIFICATION WITH THE TORCHTEXT LIBRARY from torch import nn
class TextClassificationModel(nn.Module):
def __init__(self, vocab_size, embed_dim, num_class):
super(TextClassificationModel, self).__init__()
self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=True)
self.fc = nn.Linear(embed_dim, num_class)
self.init_weights()
def init_weights(self):
initrange = 0.5
self.embedding.weight.data.uniform_(-initrange, initrange)
self.fc.weight.data.uniform_(-initrange, initrange)
self.fc.bias.data.zero_()
def forward(self, text, offsets):
embedded = self.embedding(text, offsets)
return self.fc(embedded) PyToCs conversion using nn = torch.nn;
public static class PyTorch {
public class TextClassificationModel
: nn.Module {
public object embedding;
public object fc;
public TextClassificationModel(object vocab_size, object embed_dim, object num_class) {
this.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse: true);
this.fc = nn.Linear(embed_dim, num_class);
this.init_weights();
}
public virtual object init_weights() {
var initrange = 0.5;
this.embedding.weight.data.uniform_(-initrange, initrange);
this.fc.weight.data.uniform_(-initrange, initrange);
this.fc.bias.data.zero_();
}
public virtual object forward(object text, object offsets) {
var embedded = this.embedding(text, offsets);
return this.fc(embedded);
}
}
} using static TorchSharp.torch;
using static TorchSharp.torch.nn;
using static TorchSharp.torch.nn.functional;
class TextClassificationModel : Module
{
private Modules.EmbeddingBag embedding;
private Modules.Linear fc;
public TextClassificationModel(long vocab_size, long embed_dim, long num_class) : base("TextClassification")
{
embedding = EmbeddingBag(vocab_size, embed_dim, sparse: false);
fc = Linear(embed_dim, num_class);
InitWeights();
RegisterComponents();
}
private void InitWeights()
{
var initrange = 0.5;
init.uniform_(embedding.Weight, -initrange, initrange);
init.uniform_(fc.Weight, -initrange, initrange);
init.zeros_(fc.Bias);
}
public override Tensor forward(Tensor t)
{
throw new NotImplementedException();
}
public override Tensor forward(Tensor input, Tensor offsets)
{
using var t = embedding.forward(input, offsets);
return fc.forward(t);
}
public new TextClassificationModel to(Device device)
{
base.to(device);
return this;
}
} |
I think the design of pytocs as it stands now is fine. It's a transpiler that converts Python source code to C# source code, trying to bridge the syntactic and semantic gap between the two languages. The biggest area for improvement is type inference support. It would be fantastic if pytocs could do a better job of inferring -- or using type hints -- to provide more accurate initial results. That's a question of people providing (small) samples of source code where they think pytocs could do a better job of inferring types, and fixing those. Naturally, contributions are welcome. I think providing a 100% automatic translation of idiomatic Python source code is not possible. There are constructs in Python that just cannot be translated easily/automatically to C#, but require human intervention. I've already outlined in the pytocs documentation (https://github.com/uxmal/pytocs/blob/master/doc/HOWTO.md) a suitable |
John, thanks again for taking time off to share your insight, which is valuable and not easy to gain by just looking through the codes. Currently we are doing one-week long ML.NET hackathon. I will share your valuable insight to other participants when they attempt to port python code to .NET for e.g. TorchSharp or Tensorflow.NET. Thank you. |
@uxmal a quick update. The decision to use pytorch-like syntax in TorchSharp has led to more community adoption. The TorchSharp community has grown significantly and the degree of PyTorch coverage is increasingly at steady speed. |
There are different design concepts between pytorch and TorchSharp, python code self.conv1 = nn.Conv1d(in_channels=d_model, out_channels=d_ff, kernel_size=1, bias=False) C# code this.conv1 = nn.Conv1d(inputChannel: d_model, outputChannel: d_ff, kernelSize: 1, bias: false); Parameter names are different . Some methods exist in pytorch, but they do not exist in the document. TorchSharp does not support such methods. look this dotnet/TorchSharp#901 |
@toolgood if you look into PyToCS, the parameter names could be replaced from the PyTorch version to the TorchSharp version. This will speed up beginner adopting to TorchSharp coming from pyTorch |
@uxmal I have written part of the code to convert to TorchSharp, using text replacement and regular replacement. |
@toolgood I have not look into your PR yet, just curious if you took @uxmal into consideration. Perhaps @uxmal has additional suggestions? |
@uxmal You have been doing this for close to 6 years. Now we need to challenge you for something you would not have conceived 6 years ago.
.NET is meeting python HALF WAY!
Instead of the usual Python to C#, Imagine that this task is made Simpler and quick to verify by successful compiling.
Recently, the Microsoft team decided to take a drastic decision to make .NET csharp/F# code to be as close as possible to python in the context of PyTorch to TorchSharp as shown in the attached image below.
Questions
When the codes of Python and .NET look almost similar, WHAT ADJUSTMENT and MODEIFICATIONS to PyToCs needed to make this conversion with high probability of success with minimum post-conversion manual editing?
Can you use the Tests you have created to share your suggestions?
The real world end to end use case is discussed here.
Imagine, the .NET interactive integrates both PyToCs and Roslyn, so when a python Jupyter notebook is opened within the .NET interactive, the PyTorch codes sections are extracted, converted to e.g. Csharp using PyToCs, verified the conversion by compiling internally using Roslyn. The failure of compiling will report which segments of the python codes fail to compile and still incompatible with TorchSharp. This report is critical to accelerate TorchSharp binding code coverage using real world scenario.
I hope it is clear. I hope this is an exciting exercise for the tool you have conceived 6 years ago and the .NET deep learning community need your contribution to extend your tool to a very interesting use case.
The text was updated successfully, but these errors were encountered: