Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get wrong MassTraceDetection results #437

Open
4 tasks
xieyying opened this issue Jun 16, 2024 · 17 comments
Open
4 tasks

Get wrong MassTraceDetection results #437

xieyying opened this issue Jun 16, 2024 · 17 comments
Assignees
Labels

Comments

@xieyying
Copy link

Describe the problem you encountered
I simulated a LC-MS data from a compound and analyze it using pyopenms through Untargeted Metabolomics Pre-Processing pipline. But I can not get the expected mass traces even by exhaustive tuning the parameters. I also view our data using TOPPView. and the expected mass trace all can be found. I also check iod_df generated by MzMLFile().load() and I also can find our expected mass.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

What should be happening
I should find the expected mass trace at m/z 148.5713 (the most intense), 149.0726, 149.5749, and 150.0755. But now, I just can find mass traces at m/z 149.5741 and 150.0755 by MassTraceDetection().

Screenshots
image

System information:

  • OS: Windows-10-10.0.19045-SP0/Linux-5.4.0-159-generic-x86_64-with-glibc2.31
  • OS Version: 10.0.19045-SP0
  • Python version: 3.10.13/3.11.4
  • pyopenms version: 3.1.0
  • How did you install pyopenms? Please cross a box with an X.
    • conda::bioconda: Specify your conda/mamba command and/or your environment with conda list.
    • conda::openms: Specify your conda/mamba command and/or your environment with conda list.
    • [X ] pip
    • nightly wheel
    • built from source

Additional context
The .mzML and .py file is not supported to uploaded. and Here I just pasted our code:
class FeatureDetection:
"""
Extract features from mzML files using pyopenms
"""
def init(self, file):
self.file = file
self.exp = oms.MSExperiment()
self.mass_traces = []
self.mass_traces_deconvol = []
self.feature_map = oms.FeatureMap()
self.chrom_out = []

def load_file(self):
    """Load the mzML file"""
    oms.MzMLFile().load(self.file, self.exp)

def get_dataframes(self):
    """Get the ion dataframes and filter them"""
    self.ion_df = self.exp.get_ion_df()
    self.ion_df = self.ion_df[self.ion_df['inty'] > 0.0]
    # self.ion_df.to_csv(r'C:\Users\xyy\Desktop\test\tem\ion_df.csv', index=False)
    

def mass_trace_detection(self):
    """Detect the mass traces"""
    mtd = oms.MassTraceDetection()
    mtd_par = mtd.getDefaults()
    # print(mtd_par)
    mtd_par.setValue("mass_error_ppm", 20.0)
    mtd_par.setValue('min_trace_length', 1.0)
    mtd_par.setValue("noise_threshold_int", 0.0)
    mtd_par.setValue("chrom_peak_snr", 0.0)
    mtd_par.setValue('trace_termination_criterion', 'sample_rate')
    mtd_par.setValue('min_sample_rate', 0.5)
    mtd.setParameters(mtd_par)
    mtd.run(self.exp, self.mass_traces, 0)
    print(len(self.mass_traces))
    for i in range(len(self.mass_traces)):
        trace = self.mass_traces[i]
        print(f"Trace {i}:")
        print(f"Centroid MZ: {trace.getCentroidMZ()}")
        print(f"Centroid RT: {trace.getCentroidRT()}")
        print(f"Size: {trace.getSize()}")
        print("--------------------")    

def elution_peak_detection(self):
    """Detect the elution peaks"""
    epd = oms.ElutionPeakDetection()
    epd_par = epd.getDefaults()
    epd_par.setValue("width_filtering", "fixed")
    # print(epd_par)
    epd.setParameters(epd_par)
    epd.detectPeaks(self.mass_traces, self.mass_traces_deconvol)
    print(len(self.mass_traces_deconvol))


def feature_detection(self):
    """Detect the features"""
    ffm = oms.FeatureFindingMetabo()
    ffm_par = ffm.getDefaults()
    ffm_par.setValue("local_rt_range", 10.0)
    ffm_par.setValue("local_mz_range", 6.5)
    ffm_par.setValue("chrom_fwhm", 5.0)
    ffm_par.setValue('enable_RT_filtering', 'true')
    ffm_par.setValue('mz_scoring_by_elements', 'true')
    ffm_par.setValue("elements", "CHNOPSClBrFeBSeIF")
    ffm_par.setValue('isotope_filtering_model','none')
    ffm_par.setValue("remove_single_traces", "true")
    ffm_par.setValue("report_convex_hulls", 'true')
    ffm.setParameters(ffm_par)
    ffm.run(self.mass_traces_deconvol, self.feature_map, self.chrom_out)
    self.feature_map.setUniqueIds()
    self.feature_map.setPrimaryMSRunPath([self.file.encode()])
    print(self.feature_map.get_df())
    print(len(self.feature_map.get_df()))

def run(self):
    """Run the feature detection process"""
    self.load_file()
    self.get_dataframes()
    self.mass_trace_detection()
    self.elution_peak_detection()
    self.feature_detection()
    return self.feature_map
@xieyying xieyying added the bug label Jun 16, 2024
@xieyying
Copy link
Author

@jpfeuffer
Copy link
Contributor

jpfeuffer commented Jun 16, 2024

Hi! Just to clarify, what is the exact output?
You are saying that already the first step (i.e. MassTraceDetection) failed, right?

Do you think you could reduce the example/code to the first part that fails?

@xieyying
Copy link
Author

Sure,
Here is the reduced code:
import pyopenms as oms

class FeatureDetection:
"""
Extract features from mzML files using pyopenms
"""
def init(self, file):
self.file = file
self.exp = oms.MSExperiment()
self.mass_traces = []

def load_file(self):
    """Load the mzML file"""
    oms.MzMLFile().load(self.file, self.exp)

def get_dataframes(self):
    """Get the ion dataframes and filter them"""
    self.ion_df = self.exp.get_ion_df()
    print(self.ion_df)
    
def mass_trace_detection(self):
    """Detect the mass traces"""
    mtd = oms.MassTraceDetection()
    mtd_par = mtd.getDefaults()
    # print(mtd_par)
    mtd_par.setValue("mass_error_ppm", 20.0)
    mtd_par.setValue('min_trace_length', 1.0)
    mtd_par.setValue("noise_threshold_int", 0.0)
    mtd_par.setValue("chrom_peak_snr", 0.0)
    mtd_par.setValue('trace_termination_criterion', 'sample_rate')
    mtd_par.setValue('min_sample_rate', 0.5)        
    mtd.setParameters(mtd_par)
    mtd.run(self.exp, self.mass_traces, 0)
    print(len(self.mass_traces))
    for i in range(len(self.mass_traces)):
        trace = self.mass_traces[i]
        print(f"Trace {i}:")
        print(f"Centroid MZ: {trace.getCentroidMZ()}")
        print(f"Centroid RT: {trace.getCentroidRT()}")
        print(f"Size: {trace.getSize()}")
        print("--------------------")    

def run(self):
    """Run the feature detection process"""
    self.load_file()
    self.get_dataframes()
    self.mass_trace_detection()

And here is the results. The most intense ion at m/z 148.57 can be get at the step self.ion_df = self.exp.get_ion_df(), but the mass trace on 148.57 can not get at the step of mass_trace_detection.
image

@axelwalter
Copy link
Collaborator

@xieyying I sent a request to access your example data yesterday, could you please accept that? I will take a look.

@xieyying
Copy link
Author

The data can be directly downloaded without need to request.
https://drive.google.com/drive/folders/11Nz3HQvyH0_yk3DuL4DoW81iQuAJ2JQV?usp=drive_link

For this data, I try the following parameter in the step of mass_trace_detection.
mtd_par.setValue('trace_termination_criterion', 'sample_rate')
mtd_par.setValue('min_sample_rate', 0.01)
I can find the expected m/z, but not using other 'min_sample_rate‘(0.1-1)
It seems the parameter of 'trace_termination_criterion' affects the result significantly.
But, what real meaning of trace_termination_criterion? I can not get its real meaning from the instruction of these parameters.
And in another analysis of simulated LC-MS spectrum generated from hundreds of formula, we also encountered this problems. Many features can not be found. And the parameter of 'trace_termination_criterion' seems to affect the result significantly. In this case, the most numbers of features were found using min_sample_rate', 0.1

@axelwalter
Copy link
Collaborator

@xieyying Which of these files did you use in your example?

@jpfeuffer
Copy link
Contributor

trace_termination_criterion:

  • outlier: Uses 'trace_termination_outliers' and 'min_sample_rate'. Stops tracing from the apex when n='trace_termination_outliers' consecutive scans do not have a peak at an m/z that is in the range of the current weighted mean m/z. Then also filters traces where num_scans_with_peak_in_range / num_scans < min_sample_rate.

  • sample_rate: Uses 'min_sample_rate' only. Stops tracing and discards when num_scans_with_peak_in_range / num_scans < min_sample_rate.

@jpfeuffer
Copy link
Contributor

jpfeuffer commented Jun 18, 2024

In your small example, the problem might be that there are not enough scans to the left of the apex. Please try to run the example on a larger area around that feature/trace.
(untested conjecture)

@axelwalter
Copy link
Collaborator

What happens if you keep "trace_termination_criterion" as "outlier" but reduce the number of outliers with "trace_termination_outliers" to zero? The algorithm is looking for peaks left and right which have outlier m/z values. As @jpfeuffer mentioned, the main intensity traces start at the very beginning of your peak map. No chance to find the default 5 outliers to the left.

@xieyying
Copy link
Author

@xieyying Which of these files did you use in your example?
I am so sorry, this is the correct link.

https://drive.google.com/drive/folders/1CZBRSU_hdhq7UHXdqJYQSrMUVcSfarGy?usp=drive_link

@xieyying
Copy link
Author

What happens if you keep "trace_termination_criterion" as "outlier" but reduce the number of outliers with "trace_termination_outliers" to zero? The algorithm is looking for peaks left and right which have outlier m/z values. As @jpfeuffer mentioned, the main intensity traces start at the very beginning of your peak map. No chance to find the default 5 outliers to the left.

Combination of "outlier" and "zero" still can not provide correct result. Only min_sample_rate as 0.01 was added can generate the expected result.

@axelwalter
Copy link
Collaborator

The issue is your peak map contains only data related to that simulated metabolite, but no noise at all (not realistic). Once you add some noise left and right it will work as expected.

@jpfeuffer
Copy link
Contributor

@axelwalter Ok. But can't this be disabled somehow with the correct snr and intensity settings?

@xieyying
Copy link
Author

The issue is your peak map contains only data related to that simulated metabolite, but no noise at all (not realistic). Once you add some noise left and right it will work as expected.

Yes,I add some random noise and set a suitable noise threshold. The default values for the parameter of "trace_termination_outliers" worked well and I got my expected feature.

@axelwalter Ok. But can't this be disabled somehow with the correct snr and intensity settings?

Yes, when I set higher "noise_threshold_int", the situation goes back and the expected can not be found.

@jpfeuffer
Copy link
Contributor

jpfeuffer commented Jun 18, 2024

I was actually referring to the chrom_peak_snr setting. Noise_threshold_int is just a simple minimum intensity cutoff.

There must be a way to find this trace without noise, otherwise I would consider it a bug @axelwalter

@axelwalter
Copy link
Collaborator

So I generated my own test feature (Gaussian distribution) without noise and any peaks around it. Default settings work just fine. Not sure why that was not the case for @xieyying data. Perhaps because of the strong tailing.

@jpfeuffer
Copy link
Contributor

jpfeuffer commented Jun 19, 2024

Hmm but the MassTraceDetection alone does not care about intensities other than for the weighted mz of the trace and the order in which it seeds.
Maybe if it is so tailed that it does not pick up enough peaks to the left side? It must be really few though. I think it tries at least 5 scans in each direction. And if the min_sample_frequency is 0.5 this means 3 peaks need to be found to each side.

But glad that it works in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants