Title: | Quality Report Generation for MaxQuant and mzTab Results |
---|---|
Description: | Generates Proteomics (PTX) quality control (QC) reports for shotgun LC-MS data analyzed with the MaxQuant software suite (from .txt files) or mzTab files (ideally from OpenMS 'QualityControl' tool). Reports are customizable (target thresholds, subsetting) and available in HTML or PDF format. Published in J. Proteome Res., Proteomics Quality Control: Quality Control Software for MaxQuant Results (2015) <doi:10.1021/acs.jproteome.5b00780>. |
Authors: | Chris Bielow [aut, cre], Juliane Schmachtenberg [ctb], Swenja Wagner [ctb], Patricia Scheil [ctb], Tom Waschischek [ctb], Guido Mastrobuoni [dtc, rev] |
Maintainer: | Chris Bielow <[email protected]> |
License: | BSD_3_clause + file LICENSE |
Version: | 1.1.1 |
Built: | 2024-11-07 04:42:55 UTC |
Source: | https://github.com/cbielow/ptxqc |
The following sections describe the main components:
Valid input data are either the files from MaxQuant's .txt folder (all versions from MaxQuant >= 1.0 upwards are supported) or a single mzTab file. All mzTab files will work, but most metrics can be obtained from OpenMS' mzTab as produced by the QualityControl TOPP tool (from OpenMS 2.5 onwards).
The central function of this package is called createReport
and it accepts either MaxQuant or mzTab data, along with
a configuration (optional).
There is a parser for mzTab MzTabReader
and MaxQuant txt files MQDataReader
, as well as a plethora of QC metrics
derived from a common qcMetric
class and scoring functions qual...
, e.g. qualGaussDev
.
The user can modify the behaviour of PTXQC, e.g. to enable/disable certain metrics or change scoring thresholds, via a YAML object/file.
By default a Yaml file is written automatically side-by-side to the input files upon running PTXQC for the first time on a particular input.
A custom Yaml object can be passed to the main createReport
function for customization.
Use yaml::yaml.load_file(input = 'myYAML.yaml')
to load an existing file and pass the Yaml object along.
Either a PDF and/or Html report which contains QC plots and a description of the metrics.
Useful links:
A string concatenation function, more readable than 'paste()'.
a %+% b
a %+% b
a |
Char vector |
b |
Char vector |
Concatenated string (no separator)
The input is a data frame containing feature evidence with corrected retention times, e.g. a 'calibrated.retention.time' column.
alignmentCheck(data, referenceFile)
alignmentCheck(data, referenceFile)
data |
A data.frame with columns 'calibrated.retention.time', 'id', 'modified.sequence', 'charge', 'raw.file' and 'fraction' (if present) |
referenceFile |
A raw file name as occuring in data$raw.file, serving as alignment reference (when no fractions are used). |
Note that this function must be given real MS/MS identifications only (type "MULTI-MSMS") in order to work correctly!
For each peptide sequence (and charge) in the reference Raw file, this function looks up the already calibrated retention time difference of the same feature in all other files. For every comparison made, we report the RT difference. If alignment worked perfectly, the differences are very small (<1 min).
An 'id' column must be present, to enable mapping the result of this function to the original data frame.
A reference Raw file can be identified using 'findAlignReference()'. If Maxquants experimental design included pre-fractionation, a column named 'fraction' should be given and 'referenceFile' should be empty. This function will pick the one Raw file for each fraction (the first in order) to use as reference. Only the immediately neighbouring fractions will be matched to this reference.
A data.frame containing the RT diff for each feature found in a Raw file and the reference.
The environment must exist, and its name must be given as string literal in 'env_name'! The value of the variable 'v' will be stored under the name given in 'v_name'. If 'v_name' is not given, a variable name will be created by increasing an internal counter and using the its value padded with zeros as name (i.e., "0001", "0002" etc).
appendEnv(env_name, v, v_name = NULL)
appendEnv(env_name, v, v_name = NULL)
env_name |
String of the environment variable |
v |
Value to be inserted |
v_name |
String used as variable name. Automatically generated if omitted. |
Always TRUE
Collects all 'mzQC' members from each entry in lst_qcMetrics and stores them in an overall mzQC object, which can be written to disk (see writeMZQC()) or augmented otherwise
assembleMZQC(lst_qcMetrics, raw_file_mapping)
assembleMZQC(lst_qcMetrics, raw_file_mapping)
lst_qcMetrics |
A list of qcMetric objects which have their mzQC member populated with "MzQCrunQuality" and/or "MzQCsetQuality" objects |
raw_file_mapping |
A data.frame with cols 'from', to' and maybe 'best.effort' (if shorting was unsuccessful), as e.g. obtained by a FilenameMapper$raw_file_mapping |
An MzQCmzQC object (root object of an mzQC document)
Each set has size set_size (internally optimized using correctSetSize
), holding values from 'values'.
This gives n such sets and the return value is just the set index for each value.
assignBlocks(values, set_size = 5, sort_values = TRUE)
assignBlocks(values, set_size = 5, sort_values = TRUE)
values |
Vector of values |
set_size |
Number of distinct values allowed in a set |
sort_values |
Before assigning values to sets, sort the values? |
Vector (same length as input) with set numbers
#library(PTXQC) assignBlocks(c(1:11, 1), set_size = 3, sort_values = FALSE) ## --> 1 1 1 2 2 2 3 3 3 4 4 1
#library(PTXQC) assignBlocks(c(1:11, 1), set_size = 3, sort_values = FALSE) ## --> 1 1 1 2 2 2 3 3 3 4 4 1
Given a data.frame with two/three columns in long format (name, value, [contaminant]; in that order), each group (given from 1st column) is plotted as a bar. Contaminants (if given) are separated and plotted as yellow bars.
boxplotCompare( data, log2 = TRUE, ylab = "intensity", mainlab = ylab, sublab = "", boxes_per_page = 30, abline = NA, coord_flip = TRUE, names = NA )
boxplotCompare( data, log2 = TRUE, ylab = "intensity", mainlab = ylab, sublab = "", boxes_per_page = 30, abline = NA, coord_flip = TRUE, names = NA )
data |
Data frame in long format with numerical expression data |
log2 |
Apply log2 to the data (yes/no) |
ylab |
Label on Y-axis |
mainlab |
Main title |
sublab |
Sub title |
boxes_per_page |
Maximum number of boxplots per plot. Yields multiple plots if more groups are given. |
abline |
Draw a horizontal green line at the specified y-position (e.g. to indicate target median values) |
coord_flip |
Exchange Y and X-axis for better readability |
names |
An optional data.frame(long=.., short=..), giving a renaming scheme (long->short) for the 'name' column |
Boxes are shaded: many NA or Inf lead to more transparency. Allows to easily spot sparse groups
List of ggplot objects
Internally calls 'brewer.pal(n, palette)', checking 'n' beforehand.
brewer.pal.Safe(n = 3, palette = "Set1")
brewer.pal.Safe(n = 3, palette = "Set1")
n |
Number of colours |
palette |
Name of palette (e.g. "set1") |
character vector of colors
One subset consists of 'subset_size' unique groups and thus of all rows which in 'data' which have any of these groups. The last subset might have less groups, if the number of unique groups is not dividable by subset_size.
byX(data, indices, subset_size = 5, FUN, sort_indices = TRUE, ...)
byX(data, indices, subset_size = 5, FUN, sort_indices = TRUE, ...)
data |
Data.frame whose subsets to use on FUN |
indices |
Vector of group assignments, same length as nrow(data) |
subset_size |
Number of groups to use in one subset |
FUN |
Function applied to subsets of data |
sort_indices |
Sort groups (by their sorted character(!) names) before building subsets |
... |
More arguments to FUN |
FUN is applied on each subset.
list of function result (one entry for each subset)
byX(data.frame(d=1:10), 1:10, 2, sum)
byX(data.frame(d=1:10), 1:10, 2, sum)
byX
, but with more flexible group size, to avoid that the last group has only a few entries (<50% of desired size).The 'subset_size' param is internally optimized using correctSetSize
and
then byX
is called.
byXflex(data, indices, subset_size = 5, FUN, sort_indices = TRUE, ...)
byXflex(data, indices, subset_size = 5, FUN, sort_indices = TRUE, ...)
data |
Data.frame whose subset to use on FUN |
indices |
Vector of group assignments, same length as nrow(data) |
subset_size |
Ideal number of groups to use in one subset – this can be changed internally, from 75%-150% |
FUN |
function Applied to subsets of data |
sort_indices |
Groups are formed by their sorted character(!) names |
... |
More arguments to FUN |
list of function result (one entry for each subset)
stopifnot( byXflex(data.frame(d=1:10), 1:10, 2, sum, sort_indices = FALSE) == c(3, 7, 11, 15, 19) )
stopifnot( byXflex(data.frame(d=1:10), 1:10, 2, sum, sort_indices = FALSE) == c(3, 7, 11, 15, 19) )
When MaxQuant is run with a wrong locale (i.e. the decimal separator is not a '.', but a ','), then MaxQuant results are plainly wrong and broken. The can be detected by, e.g. checking for negative charge annotation
checkEnglishLocale(df_evd)
checkEnglishLocale(df_evd)
df_evd |
Evidence table from which we only need the 'charge' column |
qMBRSeg_Dist_inGroup might be empty if there are only singlets (transferred and genuine), but then the scores will be pretty boring as well (100
computeMatchRTFractions(qMBR, qMBRSeg_Dist_inGroup)
computeMatchRTFractions(qMBR, qMBRSeg_Dist_inGroup)
qMBR |
A data.frame as computed by peakSegmentation() |
qMBRSeg_Dist_inGroup |
A data.frame as computed by inMatchWindow() |
A data.frame which details the distribution of singlets and pairs (inRT and outRT) for each Raw file and genuine vs. all
This is useful for plotting large datasets where multiple pages are needed. E.g. you know that you need 101 barplots, but you only want to fit about 25 per page. Naively one would now do five plots, with the last one only containing a single barplot. Using this function with correctSetSize(101, 25) would tell you to use 26 barplots per page, so you end up with four plots, all roughly equally filled. It also works the other extreme case, where your initial size is chosen slightly too high, e.g. Sets of size 5 for just 8 items is too much, because we can reduce the set size to 4 and still need two sets but now they are much more equally filled (correctSetSize(8, 5) == 4).
correctSetSize(item_count, initial_set_size)
correctSetSize(item_count, initial_set_size)
item_count |
Known number of items which need to assigned to sets |
initial_set_size |
Desired number of items a single set should hold |
We allow for up to set sizes of 150% from default, to avoid the last set being sparse (we remove it and distribute to the other bins) 150 Once the number of sets is fixed, we distribute all items equally.
E.g. 6 items & initial_set_size=5, would result in 2 bins (5 items, 1 item), but we'd rather have one bin of 6 items or 8 items & initial_set_size=5, would result in 2 bins (5+3 items), since the last set is more than half full, but we'd rather have 4+4
re-estimated set size which a set should hold in order to avoid underfilled sets
stopifnot( correctSetSize(8, 5) == 4 ) stopifnot( correctSetSize(101, 25) == 26 )
stopifnot( correctSetSize(8, 5) == 4 ) stopifnot( correctSetSize(101, 25) == 26 )
This is the main function of the package and the only thing you need to call directly if you are just interested in getting a QC report.
createReport( txt_folder = NULL, mztab_file = NULL, yaml_obj = list(), report_filenames = NULL, enable_log = FALSE )
createReport( txt_folder = NULL, mztab_file = NULL, yaml_obj = list(), report_filenames = NULL, enable_log = FALSE )
txt_folder |
Path to txt output folder of MaxQuant (e.g. "c:/data/Hek293/txt") |
mztab_file |
Alternative to **txt_folder**, you can provide a single mzTab file which contains PSM, PEP and PRT tables |
yaml_obj |
A nested list object with configuration parameters for the report. Useful to switch off certain plots or skip entire sections. |
report_filenames |
Optional list with names (as generated by |
enable_log |
If TRUE all console output (including warnings and errors) is logged to the file given in **report_filenames$log_file**. Note: warnings/errors can only be shown in either the log **or** the console, not both! |
You need to provide either a) the folder name of the 'txt' output, as generated by MaxQuant or an mzTab file or b) an mzTab file as generated by the OpenMS QualityControl TOPP tool (other mzTab files will probably not work)
Optionally, provide a YAML configuration object, which allows to (de)activate certain plots and holds other parameters.
The yaml_obj is complex and best obtained by running this function once using the default (empty list).
A full YAML configuration object will be written in the 'txt' folder you provide and can be loaded using
yaml.load
.
The PDF and the config file will be stored in the given txt folder.
List with named filename strings, e.g. $yaml_file, $report_file etc..
You need write access to the txt/mzTab folder!
For updates, bug fixes and feedback please visit https://github.com/cbielow/PTXQC.
Valid parameters are: param_useMQPAR, add_fs_col, id_rate_bad, id_rate_great , pg_ratioLabIncThresh , param_PG_intThresh, param_EV_protThresh , param_EV_intThresh, param_EV_pepThresh , yaml_contaminants, param_EV_MatchingTolerance, param_evd_mbr , param_EV_PrecursorTolPPM, param_EV_PrecursorOutOfCalSD , param_EV_PrecursorTolPPMmainSearch, param_MSMSScans_ionInjThresh, param_OutputFormats and param_PageNumbers
Please provide them as a list() of this format: list$parameter_name
createYaml( yc, param = list(), DEBUG_PTXQC = FALSE, txt_files = NULL, metrics = NULL )
createYaml( yc, param = list(), DEBUG_PTXQC = FALSE, txt_files = NULL, metrics = NULL )
yc |
A yaml class object created by YAMLClass$new() |
param |
list of parameters sorted by names; if empty, will be populated with defaults |
DEBUG_PTXQC |
print some debugging information; default FALSE |
txt_files |
list of paths to MaxQuant files; if NULL, it is assumed that the parameters are for mzTab-mode |
metrics |
list of metric names that should be plotted; if NULL, will be populated with defaults |
list of parameters used for creating the report and list of qc-Metrics objects
Computes sd(x) / mean(x)
CV(x)
CV(x)
x |
Vector of numeric values |
CV
Make a color (given as name or in RGB) darker by factor x = [0 = black, 1=unchanged]
darken(color, factor = 0.8)
darken(color, factor = 0.8)
color |
A color as understood by col2rgb |
factor |
Between 0 (make black) and 1 (leave color as is) |
darkened color
Replace 0 with NA in a vector
del0(x)
del0(x)
x |
A numeric vector |
Vector of same size as 'x', with 0's replaced by NA
You should provide only unique strings (to increase speed).
If only a single string is given, the empty string will be returned unless minOutputLength
is set.
delLCP(x, min_out_length = 0, add_dots = FALSE)
delLCP(x, min_out_length = 0, add_dots = FALSE)
x |
Vector of strings with common prefix |
min_out_length |
Minimal length of the shortest element of x after LCP removal [default: 0, i.e. empty string is allowed] . If the output would be shorter, the last part of the LCP is kept. |
add_dots |
Prepend output with '..' if shortening was done. |
Shortened vector of strings
delLCP(c("TK12345_H1"), min_out_length=0) ## "" delLCP(c("TK12345_H1"), min_out_length=4) ## "5_H1" delLCP(c("TK12345_H1"), min_out_length=4, add_dots = TRUE) ## "..5_H1" delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=4) ## "5_H1" "5_H2" delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=4, add_dots = TRUE) ## "..5_H1" "..5_H2" delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=8) ## "12345_H1", "12345_H2" delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=8, add_dots = TRUE) ## "TK12345_H1", "TK12345_H2" (unchanged, since '..' would add another two) delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=60) ## "TK12345_H1", "TK12345_H2" (unchanged) delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=60, add_dots = TRUE) ## "TK12345_H1", "TK12345_H2" (unchanged)
delLCP(c("TK12345_H1"), min_out_length=0) ## "" delLCP(c("TK12345_H1"), min_out_length=4) ## "5_H1" delLCP(c("TK12345_H1"), min_out_length=4, add_dots = TRUE) ## "..5_H1" delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=4) ## "5_H1" "5_H2" delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=4, add_dots = TRUE) ## "..5_H1" "..5_H2" delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=8) ## "12345_H1", "12345_H2" delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=8, add_dots = TRUE) ## "TK12345_H1", "TK12345_H2" (unchanged, since '..' would add another two) delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=60) ## "TK12345_H1", "TK12345_H2" (unchanged) delLCP(c("TK12345_H1", "TK12345_H2"), min_out_length=60, add_dots = TRUE) ## "TK12345_H1", "TK12345_H2" (unchanged)
Removes the longest common suffix (LCS) from a vector of strings.
delLCS(x)
delLCS(x)
x |
Vector of strings with common suffix |
Shortened vector of strings
delLCS(c("TK12345_H1")) ## "" delLCS(c("TK12345_H1", "TK12345_H2")) ## "TK12345_H1" "TK12345_H2" delLCS(c("TK12345_H1", "TK12!45_H1")) ## "TK123" "TK12!"
delLCS(c("TK12345_H1")) ## "" delLCS(c("TK12345_H1", "TK12345_H2")) ## "TK12345_H1" "TK12345_H2" delLCS(c("TK12345_H1", "TK12!45_H1")) ## "TK123" "TK12!"
Make sure to call $readMappingFile(some_file) if you want to support a user-defined file mapping. Otherwise, calls to $getShortNames() will create/augment the mapping for filenames.
raw_file_mapping
Data.frame with columns 'from', 'to' and maybe 'best.effort' (if shorting was unsuccessful)
mapping.creation
how the current mapping was obtained (user or auto)
external.mapping.file
Filename of user-defined mapping file; only defined if readMappingFile() was called
getShortNamesStatic(raw.files, max_len, fallbackStartNr = 1)
Static method: Shorten a set of Raw file names and return a data frame with the mappings. Mapping will have: $from, $to and optionally $best.effort (if shorting was unsuccessful and numbers had to be used)
raw.files
Vector of Raw files.
max_len
Maximal length of shortening results, before resorting to canonical names (file 1,...).
fallbackStartNr
Starting index for canonical names.
data.frame with mapping.
a = FilenameMapper$new() a$readMappingFile('filenamemapping.txt')
a = FilenameMapper$new() a$readMappingFile('filenamemapping.txt')
There is only one reference point which has '0' in 'retention.time.calibration' column in evidence.txt as corrected RT. This is true for most MaxQuant versions and also true for fractions. However, some evidence.txt files show 0.03 as an averaged minimum per Raw file. We use the raw.file with the smallest average as reference.
findAlignReference(data)
findAlignReference(data)
data |
The data.frame with columns 'retention.time.calibration' and 'raw.file' |
Note that MaxQuant uses a guide tree to align the Raw files, so the order of files does not influence the alignment. But the first file will always be used as reference point when reporting delta-RTs. And this file is also used by PTXQC as reference file vs all other files to find the real calibration function (see alignmentCheck()).
This function might return multiple raw file names (if MQ decides to change its mind at some point in the future). In this case the result should be treated with caution or (better) regarded as failure.
List of reference raw files (usually just one)
Returns a list of items for both diagnostics and possibly a fixed evidence data.frame. Also two strings with messages are returned, which can serve as user message for pre and post calibration status.
fixCalibration( df_evd, df_idrate = NULL, tolerance_sd_PCoutOfCal = 2, low_id_rate = 1 )
fixCalibration( df_evd, df_idrate = NULL, tolerance_sd_PCoutOfCal = 2, low_id_rate = 1 )
df_evd |
Evidence data.frame with columns () |
df_idrate |
Data.frame from summary.txt, giving ID rates for each raw file (cols: "ms.ms.identified....", "fc.raw.file"). Can also be NULL. |
tolerance_sd_PCoutOfCal |
Maximal standard deviation allowed before considered 'failed' |
low_id_rate |
Minimum ID rate in Percent before a Raw file is considered 'failed' |
list of data (stats, affected_raw_files, df_evd, recal_message, recal_message_post)
Flatten lists of lists with irregular depths to just a list of items, i.e. a list of the leaves (if you consider the input as a tree).
flattenList(x)
flattenList(x)
x |
List of 'stuff' (could be lists or items or a mix) |
A flat list
Abundances (should be logged already) are grouped into different levels, starting from the smallest values ("low") to the highest values ("high"). Intermediate abundances are either assigned as "mid", or "low-mid". If the range is too large, only "low" and "high" are assigned, the intermediate values are just numbers.
getAbundanceClass(x)
getAbundanceClass(x)
x |
Vector of numeric values (in log10) |
Example: getAbundanceClass(c(12.4, 17.1, 14.9, 12.3)) ## –> factor(c("low", "high", "mid", "low"))
Vector of factors corresponding to input with abundance class names (e.g. low, high)
Estimate the empirical density and return it
getECDF(samples, y_eval = (1:100)/100)
getECDF(samples, y_eval = (1:100)/100)
samples |
Vector of input values (samples from the distribution) |
y_eval |
Vector of points where CDF is evaluated (each percentile by default) |
Data.frame with columns 'x', 'y'
plot(getECDF(rnorm(1e4)))
plot(getECDF(rnorm(1e4)))
Determine if a file is 'UTF-8' or 'UTF-8-BOM' (as of MQ2.4) or 'UTF-16BE' or 'UTF-16LE'
getFileEncoding(filename)
getFileEncoding(filename)
filename |
Relative or absolute path to a file |
” if the file does not exist or is not readable
Given a data.frame as obtainable from a msms.txt with - a 'mass.analyzer' column which contains only a single value for the whole column - a 'mass.deviations..da.' and (if available) 'mass.deviations..ppm.' - a 'masses' column (only required if 'mass.deviations..ppm.' is unavailable and the mass.analyzer indicates hig-res data)
getFragmentErrors(x, recurse = 0)
getFragmentErrors(x, recurse = 0)
x |
Data frame in long format with numerical expression data |
recurse |
Internal usage only. Leave at 0 when calling. |
Mass deviations are extracted from the columns, e.g. each cell containing values separated by semicolons is split into single values. The appropriate unit is chosen (Da or ppm, depending on ITMS or FTMS data). Also the fragmentation type can be used: CID indicates ITMS, HCD to FTMS. This is not 100
Sometimes, peptides are identified purely based on MS1, i.e. have no fragments. These will be ignored.
If ppm mass deviations are not available, errors in Da will be converted to ppm using the corresponding mass values.
Data frame with mass errors ('msErr') and their 'unit' (Da or ppm) or NULL (if no fragments were given)
Create an HTML table with an extra header row
getHTMLTable(data, caption = NA)
getHTMLTable(data, caption = NA)
data |
A data.frame which serves as table |
caption |
A set of headlines, e.g. c("top line", "bottom line") |
table as html character string for cat()'ing into an html document
data = data.frame(raw.file = letters[1:4], id.rate = 3:6) getHTMLTable(data, caption = "some header line")
data = data.frame(raw.file = letters[1:4], id.rate = 3:6) getHTMLTable(data, caption = "some header line")
A vector of booleans is returned with the same length as input which contains TRUE when there is a maximum. Simply sum up the vector to get the number of maxima.
getMaxima(x, thresh_rel = 0.2)
getMaxima(x, thresh_rel = 0.2)
x |
Vector of numbers |
thresh_rel |
Minimum relative intensity to maximum intensity of 'x' required to be a maximum (i.e., a noise threshold). Default is 20%. |
Vector of bool's, where TRUE indicates a local maximum.
r = getMaxima(c(1,0,3,4,5,0)) all(r == c(TRUE,FALSE,FALSE,FALSE,TRUE,FALSE)) getMaxima(c(1, NA, 3, 2, 3, NA, 4, 2, 5))
r = getMaxima(c(1,0,3,4,5,0)) all(r == c(TRUE,FALSE,FALSE,FALSE,TRUE,FALSE)) getMaxima(c(1, NA, 3, 2, 3, NA, 4, 2, 5))
Extract meta information (orderNr, metric name, category) from a list of Qc metric objects
getMetaData(lst_qcMetrics)
getMetaData(lst_qcMetrics)
lst_qcMetrics |
List of qcMetrics |
data.frame with columns 'name', 'order' and 'cat' (category)
Get all currently available metrics
getMetricsObjects(DEBUG_PTXQC = FALSE)
getMetricsObjects(DEBUG_PTXQC = FALSE)
DEBUG_PTXQC |
Use qc objects from the package (FALSE) or from environment (TRUE/DEBUG) |
List of matric objects
If the file has the param, then return it as string. If the file is missing, warning is shown and NULL is returned. If the param (i.e. XML tag) is unknown or cannot be extracted, the program will quit (since this is a hard error). When multiple occurrences of the param are found (usually due to parameter groups), we test if the values are all identical. If so, the value is returned. If the values are different, a warning is emitted and NULL is returned unless 'allow_multiple = TRUE'
getMQPARValue(mqpar_filename, xpath, allow_multiple = FALSE)
getMQPARValue(mqpar_filename, xpath, allow_multiple = FALSE)
mqpar_filename |
Filename (incl. absolute or relative path) to the mqpar.xml file |
xpath |
An XPath to extract the content of XML tag(s), e.g. '//firstSearchTol' |
allow_multiple |
If the XPath expression returns more than one value, all values must be identical (not allowing multiple different values) or 'stop()' is called |
E.g. calling getMQPARValue("mqpar.xml", "//firstSearchTol") will look up the line <firstSearchTol>20</firstSearchTol> and return "20" (string!).
The stored value as string(!)
Create a principal component analysis (PCA) plot for the first two dimensions.
getPCA(data, do_plot = TRUE, connect_line_order = NA, gg_layer)
getPCA(data, do_plot = TRUE, connect_line_order = NA, gg_layer)
data |
Matrix(!) where each row is one high-dimensional point, with ncol dimensions, e.g. a mouse as an array of proteinexpressions rownames(data) give classes for colouring (can be duplicates in matrices, as opposed to data.frames) |
do_plot |
Show PCA plot? if ==2, then shows correlations plot as well |
connect_line_order |
Connect points by lines, the order is given by this vector. Default: NA (no lines) |
gg_layer |
More parameters added to a ggplot object (ggplot(x) + gg_layer) |
[invisible] Named list with "PCA": The PCA object as returned by prcomp
, access $x for PC values
and "plots": list of plot objects (one or two)
Required columns are "fc.raw.file", "modified.sequence" and "is.transferred".
getPeptideCounts(df_evd)
getPeptideCounts(df_evd)
df_evd |
Data.frame of evidence.txt as read by MQDataReader |
If match-between-runs was enabled during the MaxQuant run, the data.frame returned will contain separate values for 'transferred' evidence plus an 'MBRgain' column, which will give the extra MBR evidence in percent.
Data.frame with columns 'fc.raw.file', 'counts', 'category', 'MBRgain'
Required columns are "protein.group.ids", "fc.raw.file" and "is.transferred".
getProteinCounts(df_evd)
getProteinCounts(df_evd)
df_evd |
Data.frame of evidence.txt as read by MQDataReader |
If match-between-runs was enabled during the MaxQuant run, the data.frame returned will contain separate values for 'transferred' evidence plus an 'MBRgain' column, which will give the extra MBR evidence in percent.
Data.frame with columns 'fc.raw.file', 'counts', 'category', 'MBRgain'
Each list entry is a data.frame with two columns. The first one contains the Raw file name (or the short version). and should be named 'raw.file' (or 'fc.raw.file'). The second column's name must be an expression (see ?plotmath) and contains quality values in the range [0,1]. If values are outside this range, a warning is issued and values are cut to the nearest allowed value (e.g. '1.2' becomes '1'). List entries are merged and columns are ordered by name.
All substrings enclosed by 'X[0-9]*X.' will be removed (can be used for sorting columns). The resulting string is evaluated as an expression. E.g. parse(text = <colname>)
getQCHeatMap(lst_qcMetrics, raw_file_mapping)
getQCHeatMap(lst_qcMetrics, raw_file_mapping)
lst_qcMetrics |
List of QCMetric objects |
raw_file_mapping |
Data.frame with 'from' and 'to' columns for name mapping to unify names from list entries |
To judge the overall quality of each raw file a summary column is added, values being the mean of all other columns per row.
A ggplot object for printing
You can combine **report_name_has_folder** (and **mzTab_filename** for mzTab files) to obtain report filenames which are even more robust to moving around (since they contain infixes of the mzTab filename and the folder), e.g. '@em 'report_HEK293-study_myProjects.html“, where the input was ‘mzTab_filename=’HEK293-study.mzTab‘ and 'folder=’c:/somePath/myProjects/'.
getReportFilenames( folder, report_name_has_folder = TRUE, mzTab_filename = NULL )
getReportFilenames( folder, report_name_has_folder = TRUE, mzTab_filename = NULL )
folder |
Directory where the MaxQuant output (txt folder) or the mzTab file resides |
report_name_has_folder |
Boolean: Should the report files (html, pdf) contain the name of the deepest(=last) subdirectory in **txt_folder** which is not 'txt'? Useful for discerning different reports in a PDF viewer. E.g. when flag is FALSE: 'report_v0.91.0.html'; and 'report_v0.91.0_bloodStudy.html' when flag is TRUE (and the txt folder is '.../bloodStudy/txt/' or '...bloodStudy/') |
mzTab_filename |
If input is an mzTab, specify its name, so that the filenames can use its basename as infix E.g. when ‘mzTab_filename = ’HEK293-study.mzTab'' then the output will be 'report_HEK293-study.html'. This allows to get reports on multiple mzTabs in the same folder without overwriting report results. |
List of output file names (just names, no file is created) with list entries: **yaml_file**, **heatmap_values_file**, **R_plots_file**, **filename_sorting**, **mzQC_file**, **log_file**, **report_file_prefix**, **report_file_PDF**, **report_file_HTML**
Get an mzQC runQuality without actual metrics, but with full metadata
getRunQualityTemplate(fc.raw.file, raw_file_mapping)
getRunQualityTemplate(fc.raw.file, raw_file_mapping)
fc.raw.file |
For which run |
raw_file_mapping |
A data.frame with cols 'from', 'to' and maybe 'best.effort' (if shorting was unsuccessful), as e.g. obtained by a FilenameMapper$raw_file_mapping |
An MzQCrunQuality object
By default, 20 labels (or up to 40 see below) are shown. If the number of items is less than twice the number of desired labels, all labels will be shown (to avoid irregular holes for some labels). I.e. if n=20, and x has 22 entries, there would be only two labels removed, giving a very irregular picture. It only becomes somewhat regular if after any label there is at least one blank, i.e. at most half the entries are labeled. #' Example: ## p is any ggplot object p + scale_y_discrete(breaks = ggAxisLabels) ## customize 'n' my.ggAxisLabels = function(x) ggAxisLabels(x, n = 4) p + scale_y_discrete(breaks = my.ggAxisLabels)
ggAxisLabels(x, n = 20)
ggAxisLabels(x, n = 20)
x |
Vector of labels (passed by GGplot) |
n |
Number of labels to show |
Shortened version of 'x'
Plot a text as graphic using ggplot2.
ggText(title, text, col = "black")
ggText(title, text, col = "black")
title |
The title of the plot |
text |
Centered text, can contain linebreaks |
col |
Colour of text (excluding the title) |
ggplot object
The parameter 'value' should not be passed to this function since it is passed internally already.
grepv(reg, data, ...)
grepv(reg, data, ...)
reg |
regex param |
data |
container |
... |
other params forwarded to grep() |
values of data which matched the regex
grepv("x", c("abc", "xyz")) ## --> "xyz"
grepv("x", c("abc", "xyz")) ## --> "xyz"
The input is a data.frame containing feature evidence with corrected retention times, e.g. a 'calibrated.retention.time' column.
idTransferCheck(df_evd_all)
idTransferCheck(df_evd_all)
df_evd_all |
A data.frame with columns 'type', 'calibrated.retention.time', 'modified.sequence', 'charge', 'raw.file' |
Note that this function must be given MS/MS identifications of type "MULTI-MSMS" and "MSMS-MATCH". It will stop() otherwise.
We compare for each peptide sequence (and charge) the RT difference within groups of either genuine as well as mixed pairs. For every comparison made, we report the RT span If alignment worked perfectly, the span are very small (<1 min), for the mixed group, i.e. the pairs are accidentally split 3D peaks. Alignment performance has no influence on the genuine-only groups.
Note: We found early MaxQuant versions (e.g. 1.2.2.5) to have an empty 'modified.sequence' column for 'MULTI-MATCH' entries. The sequence which SHOULD be present is equal to the immediate upper row. This is what we use to guess the sequence. However, this relies on the data.frame not being subsetted before (we can sort using the 'id' column)!
A data.frame containing the RT diff for each ID-group found in a Raw file (bg = genuine).
Looking at groups only: Compute the fraction of 3D-peak pair groups per Raw file which have an acceptable RT difference after alignment using the result from 'idTransferCheck()', i.e. compute the fraction of groups which are within a certain RT tolerance.
inMatchWindow(data, df.allowed.deltaRT)
inMatchWindow(data, df.allowed.deltaRT)
data |
A data.frame with columns 'fc.raw.file', 'rtdiff_mixed', 'rtdiff_genuine' |
df.allowed.deltaRT |
The allowed matching difference for each Raw file (as data.frame(fc.rawfile, m)) |
Returned value is between 0 (bad) and 1 (all within tolerance).
A data.frame with one row for each raw.file and columns 'raw.file' and score 'withinRT' (0-1)
Count the number of chars of the longest common prefix
lcpCount(x)
lcpCount(x)
x |
Vector of strings with common prefix |
Length of LCP
Implementation is very inefficient (dynamic programming in R) –> use only on small instances
LCS(s1, s2)
LCS(s1, s2)
s1 |
String one |
s2 |
String two |
String containing the longest common substring
Count the number of chars of the longest common suffix
lcsCount(x)
lcsCount(x)
x |
Vector of strings with common suffix |
Length of LCS
Warning: greedy heuristic! This is not guaranteed to find the best solution (or any solution at all), since its done pairwise with the shortest input string as reference.
LCSn(strings, min_LCS_length = 0)
LCSn(strings, min_LCS_length = 0)
strings |
A vector of strings in which to search for LCS |
min_LCS_length |
Minimum length expected. Empty string is returned if the result is shorter |
longest common substring (or "" if shorter than min_LCS_length
)
LCSn(c("1_abcde...", "2_abcd...", "x_abc...")) ## --> "_abc" LCSn(c("16_IMU008_CISPLA_E5_R11", "48_IMU008_CISPLA_P4_E7_R31", "60_IMU008_CISPLA_E7_R11"), 3) ## -->"_IMU008_CISPLA_" LCSn(c("AAAAACBBBBB", "AAAAADBBBBB", "AAAABBBBBEF", "AAABBBBBDGH")) ## --> "BBBBB" LCSn(c("AAAXXBBB", "BBBXXDDD", "XXAAADDD")) ## --> fails due to greedy approach; should be "XX"
LCSn(c("1_abcde...", "2_abcd...", "x_abc...")) ## --> "_abc" LCSn(c("16_IMU008_CISPLA_E5_R11", "48_IMU008_CISPLA_P4_E7_R31", "60_IMU008_CISPLA_E7_R11"), 3) ## -->"_IMU008_CISPLA_" LCSn(c("AAAAACBBBBB", "AAAAADBBBBB", "AAAABBBBBEF", "AAABBBBBDGH")) ## --> "BBBBB" LCSn(c("AAAXXBBB", "BBBXXDDD", "XXAAADDD")) ## --> fails due to greedy approach; should be "XX"
Input is converted to character (e.g. from factor) first.
longestCommonPrefix(strings)
longestCommonPrefix(strings)
strings |
Vector of strings |
Single string - might be empty ("")
longestCommonPrefix(c("CBA.321", "CBA.77654", "")) ## "" longestCommonPrefix(c("CBA.321", "CBA.77654", "CB")) ## "CB" longestCommonPrefix(c("ABC.123", "ABC.456")) ## "ABC." longestCommonPrefix(c("nothing", "in", "common")) ## ""
longestCommonPrefix(c("CBA.321", "CBA.77654", "")) ## "" longestCommonPrefix(c("CBA.321", "CBA.77654", "CB")) ## "CB" longestCommonPrefix(c("ABC.123", "ABC.456")) ## "ABC." longestCommonPrefix(c("nothing", "in", "common")) ## ""
Like longestCommonPrefix(), but on the suffix.
longestCommonSuffix(strings)
longestCommonSuffix(strings)
strings |
Vector of strings |
Single string - might be empty ("")
longestCommonSuffix(c("123.ABC", "45677.ABC", "BC")) ## "BC" longestCommonSuffix(c("123.ABC", "", "BC")) ## "" longestCommonSuffix(c("123.ABC", "45677.ABC")) ## ".ABC" longestCommonSuffix(c("nothing", "in", "common")) ## ""
longestCommonSuffix(c("123.ABC", "45677.ABC", "BC")) ## "BC" longestCommonSuffix(c("123.ABC", "", "BC")) ## "" longestCommonSuffix(c("123.ABC", "45677.ABC")) ## ".ABC" longestCommonSuffix(c("nothing", "in", "common")) ## ""
Convert list of (mixed)modifications to a frequency table
modsToTable(mod_list)
modsToTable(mod_list)
mod_list |
A vector with modifications, each for a specific peptide. Multiple mods per entry are allowed, each separated by comma. |
A data.frame with 'modification_names' and 'Freq' (0-100)
modsToTable(c("Ox (M)", "Unmodified", "Ox (M),Acetyl (Prot N-term)", "2 Ox (M)", "Unmodified", "Unmodified"))
modsToTable(c("Ox (M)", "Unmodified", "Ox (M),Acetyl (Prot N-term)", "2 Ox (M)", "Unmodified", "Unmodified"))
Convert list of (mixed)modifications to a frequency table
modsToTableByRaw( df_evd, name_unmod = "Unmodified", name_unmod_inverse = "Modified (total)" )
modsToTableByRaw( df_evd, name_unmod = "Unmodified", name_unmod_inverse = "Modified (total)" )
df_evd |
data.frame with 'fc.raw.file' and a 'modifications' column, which contains the modifications for each peptide. |
name_unmod |
String in 'modifications' which represents an unmodified peptide |
name_unmod_inverse |
If non-empty, then inverse the frequencies of the 'name_unmod' modifications (i.e. 100-x) IFF they are >=50% on average (across Raw files) and rename them to this string |
A data.table with 'fc.raw.file', 'modification_names' (factor), and 'Freq' (0-100)
data = data.frame(fc.raw.file = rep(c("file A", "file B"), each = 3), modifications = c("Oxidation (M)", "Unmodified", "Oxidation (M),Acetyl (Protein N-term)", "2 Oxidation (M)", "Unmodified", "Unmodified")) modsToTableByRaw(data)
data = data.frame(fc.raw.file = rep(c("file A", "file B"), each = 3), modifications = c("Oxidation (M)", "Unmodified", "Oxidation (M),Acetyl (Protein N-term)", "2 Oxidation (M)", "Unmodified", "Unmodified")) modsToTableByRaw(data)
Found at http://stackoverflow.com/questions/19233365/how-to-create-a-marimekko-mosaic-plot-in-ggplot2 Modified (e.g. to pass R check)
mosaicize(data)
mosaicize(data)
data |
A data.frame with exactly two columns |
Returns a data frame, which can be used for plotting and has the following columns: 'Var1' - marginalized values from 1st input column 'Var2' - marginalized values from 2nd input column 'Freq' - relative frequency of the combination given in [Var1, Var2] 'margin_var1' - frequency of the value given in Var1 'var2_height' - frequency of the value given in Var2, relative to Var1 'var1_center' - X-position when plotting (large sets get a larger share)
Data.frame
data = data.frame(raw.file = c(rep('file A', 100), rep('file B', 40)), charge = c(rep(2, 60), rep(3, 30), rep(4, 10), rep(2, 30), rep(3, 7), rep(4, 3))) mosaicize(data)
data = data.frame(raw.file = c(rep('file A', 100), rep('file B', 40)), charge = c(rep(2, 60), rep(3, 30), rep(4, 10), rep(2, 30), rep(3, 7), rep(4, 3))) mosaicize(data)
This class is used to read MQ data tables using MQDataReader::readMQ()
while holding
the internal raw file –> short raw file name mapping (stored in a member called
'fn_map') and updating/using it every time MQDataReader::readMQ()
is called.
file |
(Relative) path to a MQ txt file. |
filter |
Searched for "C" and "R". If present, [c]ontaminants and [r]everse hits are removed if the respective columns are present.
E.g. to filter both, |
type |
Allowed values are: "pg" (proteinGroups) [default], adds abundance index columns (*AbInd*, replacing 'intensity') "sm" (summary), splits into three row subsets (raw.file, condition, total) "ev" (evidence), will fix empty modified.sequence cells for older MQ versions (when MBR is active) "msms_scans", will fix invalid (negative) scan event numbers Any other value will not add/modify any columns |
col_subset |
A vector of column names as read by read.delim(), e.g., spaces are replaced by dot already. If given, only columns with these names (ignoring lower/uppercase) will be returned (regex allowed) E.g. col_subset=c("^lfq.intensity.", "protein.name") |
add_fs_col |
If TRUE and a column 'raw.file' is present, an additional column 'fc.raw.file' will be added with
common prefix AND common substrings removed ( |
check_invalid_lines |
After reading the data, check for unusual number of NA's to detect if file was corrupted by Excel or alike |
LFQ_action |
[For type=='pg' only] An additional custom LFQ column ('cLFQ...') is created where zero values in LFQ columns are replaced by the following method IFF(!) the corresponding raw intensity is >0 (indicating that LFQ is erroneusly 0) "toNA": replace by NA "impute": replace by lowest LFQ value >0 (simulating 'noise') |
... |
Additional parameters passed on to read.delim() |
colname |
Name of the column (e.g. 'contaminants') in the mq.data table |
valid_entries |
Vector of values to be replaced (must contain all values expected in the column – fails otherwise) |
replacements |
Vector of values inserted with the same length as |
Since MaxQuant changes capitalization and sometimes even column names, it seemed convenient to have a function which just reads a txt file and returns unified column names, irrespective of the MQ version. So, it unifies access to columns (e.g. by using lower case for ALL columns) and ensures columns are identically named across MQ versions:
alternative term new term ----------------------------------------- protease enzyme protein.descriptions fasta.headers potential.contaminant contaminant mass.deviations mass.deviations..da. basepeak.intensity base.peak.intensity
We also correct 'reporter.intensity.*' naming issues to MQ 1.6 convention, when 'reporter.intensity.not.corrected' is present. MQ 1.5 uses: reporter.intensity.X and reporter.intensity.not.corrected.X MQ 1.6 uses: reporter.intensity.X and reporter.intensity.corrected.X
Note: you must find a regex which matches both versions, or explicitly add both terms if you are requesting only a subset of columns!
Fixes for msmsScans.txt: negative Scan Event Numbers in msmsScans.txt are reconstructed by using other columns
Automatically detects UTF8-BOM encoding and deals with it (since MQ2.4).
Example of usage:
mq = MQDataReader$new() d_evd = mq$readMQ("evidence.txt", type="ev", filter="R", col_subset=c("proteins", "Retention.Length", "retention.time.calibration"))
If the file is empty, this function shows a warning and returns NULL. If the file is present but cannot be read, the program will stop.
Wrapper to read a MQ txt file (e.g. proteinGroups.txt).
A data.frame of the respective file
Replaces values in the mq.data member with (binary) values.
Most MQ tables contain columns like 'contaminants' or 'reverse', whose values are either empty strings
or "+", which is inconvenient and can be much better represented as TRUE/FALSE.
The params valid_entries
and replacements
contain the matched pairs, which determine what is replaced with what.
Returns TRUE
if successful.
getInvalidLines()
Detect broken lines (e.g. due to Excel import+export)
When editing a MQ txt file in Microsoft Excel, saving the file can cause it to be corrupted, since Excel has a single cell content limit of 32k characters (see http://office.microsoft.com/en-001/excel-help/excel-specifications-and-limits-HP010342495.aspx) while MQ can easily reach 60k (e.g. in oxidation sites column). Thus, affected cells will trigger a line break, effectively splitting one line into two (or more).
If the table has an 'id' column, we can simply check the numbers are consecutive. If no 'id' column is available, we detect line-breaks by counting the number of NA's per row and finding outliers. The line break then must be in this line (plus the preceeding or following one). Depending on where the break happened we can also detect both lines right away (if both have more NA's than expected).
Currently, we have no good strategy to fix the problem since columns are not aligned any longer, which leads to columns not having the class (e.g. numeric) they should have. (thus one would need to un-do the linebreak and read the whole file again)
[Solution to the problem: try LibreOffice 4.0.x or above – seems not to have this limitation]
@return Returns a vector of indices of broken (i.e. invalid) lines
The 'sections' field is initialized after $readMzTab was called. The 'fn_map' fields should be initialized via ...$fn_map$readMappingFile(...) manually if user-defined filename mappings are desired and is automatically updated/queried when $readMzTab is called.
sections
MzTab sections as list. Valid list entries are: "MTD", "PRT", "PEP", "PSM", "SML", "filename" and "comments"
fn_map
FilenameMapper which can translate raw filenames into something shorter
RTUnitCorrection(dt)
Convert all RT columns from seconds (OpenMS default) to minutes (MaxQuant default)
getEvidence()
Basically the PSM table and additionally columns named 'raw.file' and 'fc.raw.file'.
getMSMSScans(identified_only = FALSE)
Basically the PSM table (partially renamed columns) and additionally two columns 'raw.file' and 'fc.raw.file'. If identified_only is TRUE, only MS2 scans which were identified (i.e. a PSM) are returned – this is equivalent to msms.txt in MaxQuant.
getParameters()
Converts internal mzTab metadata section to a two column key-value data.frame similar to MaxQuants parameters.txt.
getProteins()
Basically the PRT table ...
getSummary()
Converts internal mzTab metadata section to a two data.frame with columns 'fc.raw.file', 'ms.ms.identified....' similar to MaxQuants summary.txt.
renameColumns(dt, namelist)
Renames all columns and throws a warning if a column does not exist in the data
paste with newline as separator
pasten(...)
pasten(...)
... |
Arguments forwarded to paste() |
return value of paste()
pasten("newline","separated") ## --> "newline\nseparated"
pasten("newline","separated") ## --> "newline\nseparated"
paste with tab as separator
pastet(...)
pastet(...)
... |
Arguments forwarded to paste() |
return value of paste()
pastet("tab","separated") ## --> "tab\tseparated"
pastet("tab","separated") ## --> "tab\tseparated"
Sometimes, MQ splits a feature into 2 or more if the chromatograpic conditions are not optimal and there is a drop in RT intensity. If both features contain successful MS/MS scans, we will find the same peptide twice (with slightly different RT) in the same charge state. This constitutes a natively split peak and is rare (95
peakSegmentation(df_evd_all)
peakSegmentation(df_evd_all)
df_evd_all |
A data.frame of evidences containing the above columns |
If Match-between-runs is used and the RT alignment is not perfect, then a peptide might be inferred at a wrong RT position, even though this Raw file already contains MS/MS evidence of this peptide. Usually the number of peak duplicates rises drastically (e.g. only 75 In most cases, the RT is too far off to be a split peak. It's rather a lucky hit with accidentally the same mass-to-charge, and thus the intensity is random. To find by how much these peak pairs differ in RT, use idTransferCheck() and inMatchWindow().
Required columns are 'is.transferred', 'fc.raw.file', 'modified.sequence', 'charge', 'type'.
Note that this function must be given MS/MS identifications of type "MULTI-MSMS" and "MSMS-MATCH". It will stop() otherwise.
A data.frame with one row per Raw file and three columns: 1) 2) 3)
Should be applied for each Raw file individually.
peakWidthOverTime(data, RT_bin_width = 2)
peakWidthOverTime(data, RT_bin_width = 2)
data |
Data.frame with columns 'retention.time' and 'retention.length' |
RT_bin_width |
Bin size in minutes |
Returns a data.frame, where 'bin' gives the index of each bin, 'RT' is the middle of each bin and 'peakWidth' is the averaged peak width per bin.
Data.frame with columns 'bin', 'RT', 'peakWidth'
data = data.frame(retention.time = seq(30,200, by=0.001)) ## one MS/MS per 0.1 sec data$retention.length = seq(0.3, 0.6, length.out = nrow(data)) + rnorm(nrow(data), 0, 0.1) d = peakWidthOverTime(data) plot(d$RT, d$peakWidth)
data = data.frame(retention.time = seq(30,200, by=0.001)) ## one MS/MS per 0.1 sec data$retention.length = seq(0.3, 0.6, length.out = nrow(data)) + rnorm(nrow(data), 0, 0.1) d = peakWidthOverTime(data) plot(d$RT, d$peakWidth)
Boxes are optionally colored to indicate that a MQ bug was detected or if PTXQC detected a too narrow search window.
plot_CalibratedMSErr( data, MQBug_raw_files, stats, y_lim, extra_limit = NA, title_sub = "" )
plot_CalibratedMSErr( data, MQBug_raw_files, stats, y_lim, extra_limit = NA, title_sub = "" )
data |
A data.frame with columns 'fc.raw.file', 'mass.error..ppm.' |
MQBug_raw_files |
List of Raw files with invalid calibration values |
stats |
A data.frame with columns 'fc.raw.file', 'outOfCal' |
y_lim |
Range of y-axis |
extra_limit |
Position where a v-line is plotted (for visual guidance) |
title_sub |
Subtitle |
GGplot object
n = c(150, 1000, 1000, 1000) data = data.frame(fc.raw.file = repEach(letters[4:1], n), mass.error..ppm. = c(rnorm(n[1], 1, 2.4), rnorm(n[2], 0.5, 0.5), rnorm(n[3], 0.1, 0.7), rnorm(n[4], 0.3, 0.8))) stats = data.frame(fc.raw.file = letters[4:1], sd = c(2.4, 0.5, 0.7, 0.8), outOfCal = c(TRUE, FALSE, FALSE, FALSE)) plot_CalibratedMSErr(data, MQBug_raw_files = letters[1], stats, y_lim = c(-20,20), 15, "subtitle")
n = c(150, 1000, 1000, 1000) data = data.frame(fc.raw.file = repEach(letters[4:1], n), mass.error..ppm. = c(rnorm(n[1], 1, 2.4), rnorm(n[2], 0.5, 0.5), rnorm(n[3], 0.1, 0.7), rnorm(n[4], 0.3, 0.8))) stats = data.frame(fc.raw.file = letters[4:1], sd = c(2.4, 0.5, 0.7, 0.8), outOfCal = c(TRUE, FALSE, FALSE, FALSE)) plot_CalibratedMSErr(data, MQBug_raw_files = letters[1], stats, y_lim = c(-20,20), 15, "subtitle")
The input is a data.frame with columns 'Var1' - name of the Raw file 'Var2' - charge (used as fill color) 'Var1_center' - contains X-position of the Raw file 'Var2_height' - relative frequency of the charge 'Margin_var1' - where each row represents one peptide sequence.
plot_Charge(d_charge)
plot_Charge(d_charge)
d_charge |
A data.frame with columns as described above |
GGplot object
data = data.frame(raw.file = c(rep('file A', 100), rep('file B', 40)), data = c(rep(2, 60), rep(3, 30), rep(4, 10), rep(2, 30), rep(3, 7), rep(4, 3))) plot_Charge(mosaicize(data))
data = data.frame(raw.file = c(rep('file A', 100), rep('file B', 40)), data = c(rep(2, 60), rep(3, 30), rep(4, 10), rep(2, 30), rep(3, 7), rep(4, 3))) plot_Charge(mosaicize(data))
Plot contaminants from evidence.txt, broken down into top5-proteins.
plot_ContEVD(data, top5)
plot_ContEVD(data, top5)
data |
A data.frame with columns 'fc.raw.file', 'contaminant', 'pname', 'intensity' |
top5 |
Name of the Top-5 Proteins (by relative intensity or whatever seems relevant) |
GGplot object
data = data.frame(intensity = 1:12, pname = rep(letters[1:3], 4), fc.raw.file = rep(paste("f", 1:4), each=3), contaminant = TRUE) ## providing more proteins than present... d,e will be ignored plot_ContEVD(data, top5 = letters[1:5]) ## classify 'c' as 'other' plot_ContEVD(data, top5 = letters[1:2])
data = data.frame(intensity = 1:12, pname = rep(letters[1:3], 4), fc.raw.file = rep(paste("f", 1:4), each=3), contaminant = TRUE) ## providing more proteins than present... d,e will be ignored plot_ContEVD(data, top5 = letters[1:5]) ## classify 'c' as 'other' plot_ContEVD(data, top5 = letters[1:2])
Plot contaminants from proteinGroups.txt
plot_ContsPG(data)
plot_ContsPG(data)
data |
A data.frame with columns 'group', 'cont_pc', 'logAbdClass' |
GGplot object
data = data.frame( 'group' = letters[1:10], 'cont_pc' = 2:11, 'logAbdClass' = c("low","high")) plot_ContsPG(data)
data = data.frame( 'group' = letters[1:10], 'cont_pc' = 2:11, 'logAbdClass' = c("low","high")) plot_ContsPG(data)
Kolmogorov-Smirnoff p-values are plotted on top of each group. High p-values indicate that Andromeda scores for contaminant peptides are equal or higher compared to sample peptide scores, i.e. the probability that sample peptides scores are NOT greater than contaminant peptide scores.
plot_ContUser(data, name_contaminant, extra_limit, subtitle = NULL)
plot_ContUser(data, name_contaminant, extra_limit, subtitle = NULL)
data |
A data.frame with columns 'fc.raw.file', 'variable', 'value' |
name_contaminant |
Name of the contaminant shown in title |
extra_limit |
Position where a h-line is plotted (for visual guidance) |
subtitle |
Optional subtitle for plot |
GGplot object
data = data.frame(fc.raw.file = letters[1:3], variable = c(rep("spectralCount", 3), rep("intensity", 3), rep("above.thresh", 3), rep("score_KS", 3)), value = c(10, 20, 15, 9, 21, 14, 0, 1, 1, 0.3, 0.01, 0.04)) plot_ContUser(data, "myco", 5, "subtitle")
data = data.frame(fc.raw.file = letters[1:3], variable = c(rep("spectralCount", 3), rep("intensity", 3), rep("above.thresh", 3), rep("score_KS", 3)), value = c(10, 20, 15, 9, 21, 14, 0, 1, 1, 0.3, 0.01, 0.04)) plot_ContUser(data, "myco", 5, "subtitle")
The data is expected to be an ECDF already, x being the Andromeda score, y being the culmulative probability. The Score is the probability of a Kolm.-Smirnoff test that the contaminant scores are larger (i.e. large p-values indicate true contamination). You will only see this plot if the but high-scoring contaminant peptides, which would erroneously give you a large p-value and make you believe your sample is contaminated although that's not the case.
plot_ContUserScore(data, raw.file, score)
plot_ContUserScore(data, raw.file, score)
data |
A data.frame with columns 'x', 'y', 'condition' |
raw.file |
Name of Raw file for which the data is displayed (will become part of the plot title) |
score |
Score of how distinct the distributions are (will become part of the title) |
GGplot object
data = data.frame(x = 10:60, y = c(seq(0,1,length=51), seq(0.1, 1, length=51)), condition = rep(c("sample","contaminant"), each=51)) plot_ContUserScore(data, 'test file', 0.96)
data = data.frame(x = 10:60, y = c(seq(0,1,length=51), seq(0.1, 1, length=51)), condition = rep(c("sample","contaminant"), each=51)) plot_ContUserScore(data, 'test file', 0.96)
The input is a data.frame with protein/peptide counts, where 'category' designates the origin of information (genuine ID, transferred ID, or both).
plot_CountData(data, y_max, thresh_line, title)
plot_CountData(data, y_max, thresh_line, title)
data |
A data.frame with columns 'fc.raw.file', 'counts', 'category' |
y_max |
Plot limit of y-axis |
thresh_line |
Position of a threshold line, indicating the usual target value |
title |
Main title, and optional subtitle (if vector of length 2 is provided) |
GGplot object
data = data.frame(fc.raw.file = rep(c("file A", "file B"), each=3), counts = c(3674, 593, 1120, 2300, 400, 600), category = c("genuine","genuine+transferred","transferred")) plot_CountData(data, 6000, 4000, c("EVD: Protein Groups count", "gain: 23%"))
data = data.frame(fc.raw.file = rep(c("file A", "file B"), each=3), counts = c(3674, 593, 1120, 2300, 400, 600), category = c("genuine","genuine+transferred","transferred")) plot_CountData(data, 6000, 4000, c("EVD: Protein Groups count", "gain: 23%"))
The input is a data.frame with columns 'RT' - RT in seconds, representing one bin 'counts' - number of counts at this bin 'fc.raw.file' - name of the Raw file where each row represents one bin in RT.
plot_DataOverRT( data, title, y_lab, x_lim = range(data$RT), y_max = max(data$counts) )
plot_DataOverRT( data, title, y_lab, x_lim = range(data$RT), y_max = max(data$counts) )
data |
A data.frame with columns as described above |
title |
The plot title |
y_lab |
Label of y-axis |
x_lim |
Limits of the x-axis (2-tuple) |
y_max |
Maximum of the y-axis (single value) |
At most nine(!) Raw files can be plotted. If more are given, an error is thrown.
GGplot object
data = data.frame(fc.raw.file = rep(paste('file', letters[1:3]), each=30), RT = seq(20, 120, length.out = 30), counts = c(rnorm(30, 400, 20), rnorm(30, 250, 15), rnorm(30, 50, 15))) plot_DataOverRT(data, "some title", "count data")
data = data.frame(fc.raw.file = rep(paste('file', letters[1:3]), each=30), RT = seq(20, 120, length.out = 30), counts = c(rnorm(30, 400, 20), rnorm(30, 250, 15), rnorm(30, 50, 15))) plot_DataOverRT(data, "some title", "count data")
Useful for a first overall impression of the data.
plot_IDRate(data, id_rate_bad, id_rate_great, label_ID)
plot_IDRate(data, id_rate_bad, id_rate_great, label_ID)
data |
A data.frame with columns as described above |
id_rate_bad |
Number below which the ID rate is considered bad |
id_rate_great |
Number above which the ID rate is considered great |
label_ID |
Named vector with colors for the categories given in data$cat |
The input is a data.frame with columns 'fc.raw.file' - name of the Raw file 'ms.ms.identified....' - fraction of identified MS/MS spectra in percent 'cat' - identification category as arbitrary string where each row represents one Raw file.
GGplot object
id_rate_bad = 20; id_rate_great = 35; label_ID = c("bad (<20%)" = "red", "ok (...)" = "blue", "great (>35%)" = "green") data = data.frame(fc.raw.file = paste('file', letters[1:3]), ms.ms.identified.... = rnorm(3, 25, 15)) data$cat = factor(cut(data$ms.ms.identified...., breaks=c(-1, id_rate_bad, id_rate_great, 100), labels=names(label_ID))) plot_IDRate(data, id_rate_bad, id_rate_great, label_ID)
id_rate_bad = 20; id_rate_great = 35; label_ID = c("bad (<20%)" = "red", "ok (...)" = "blue", "great (>35%)" = "green") data = data.frame(fc.raw.file = paste('file', letters[1:3]), ms.ms.identified.... = rnorm(3, 25, 15)) data$cat = factor(cut(data$ms.ms.identified...., breaks=c(-1, id_rate_bad, id_rate_great, 100), labels=names(label_ID))) plot_IDRate(data, id_rate_bad, id_rate_great, label_ID)
Uses plot_DataOverRT() internally.
plot_IDsOverRT(data, x_lim = range(data$RT), y_max = max(data$counts))
plot_IDsOverRT(data, x_lim = range(data$RT), y_max = max(data$counts))
data |
A data.frame with columns as described above |
x_lim |
Limits of the x-axis (2-tuple) |
y_max |
Maximum of the y-axis (single value) |
GGplot object
data = data.frame(fc.raw.file = rep(paste('file', letters[1:3]), each=30), RT = seq(20, 120, length.out = 30), counts = c(rnorm(30, 400, 20), rnorm(30, 250, 15), rnorm(30, 50, 15))) plot_IDsOverRT(data)
data = data.frame(fc.raw.file = rep(paste('file', letters[1:3]), each=30), RT = seq(20, 120, length.out = 30), counts = c(rnorm(30, 400, 20), rnorm(30, 250, 15), rnorm(30, 50, 15))) plot_IDsOverRT(data)
Number of Raw files must be 6 at most. Function will stop otherwise.
plot_IonInjectionTimeOverRT(data, stats, extra_limit)
plot_IonInjectionTimeOverRT(data, stats, extra_limit)
data |
A data.frame with columns 'fc.raw.file', 'rRT', 'medIIT' |
stats |
A data.frame with columns 'fc.raw.file', 'mean' |
extra_limit |
Visual guidance line (maximum acceptable IIT) |
GGplot object
data = data.frame(fc.raw.file = rep(c("d","a","x"), each=100), rRT = seq(20, 120, length.out = 100), medIIT = c(round(runif(100, min=3, max=5)), round(runif(100, min=5, max=8)), round(runif(100, min=1, max=3))) ) stats = data.frame(fc.raw.file = c("d","a","x"), mean = c(4, 6.5, 2)) plot_IonInjectionTimeOverRT(data, stats, 10)
data = data.frame(fc.raw.file = rep(c("d","a","x"), each=100), rRT = seq(20, 120, length.out = 100), medIIT = c(round(runif(100, min=3, max=5)), round(runif(100, min=5, max=8)), round(runif(100, min=1, max=3))) ) stats = data.frame(fc.raw.file = c("d","a","x"), mean = c(4, 6.5, 2)) plot_IonInjectionTimeOverRT(data, stats, 10)
The plots shows the correction function applied by MaxQuant, and the residual RT (ideally 0) of each peptide to its reference. Uncalibrated peptides are shown in red, calibrated ones in green. The MaxQuant RT correction which was applied prior is shown in blue. The range of this function can give hints if the allowed RT search window (20min by default) is sufficient or if MaxQuant should be re-run with more tolerant settings.
plot_MBRAlign(data, y_lim, title_sub, match_tol)
plot_MBRAlign(data, y_lim, title_sub, match_tol)
data |
A data.frame with columns as described above |
y_lim |
Plot range of y-axis |
title_sub |
Subtitle |
match_tol |
Maximal residual RT delta to reference (usually ~1 min) |
The input is a data.frame with columns 'calibrated.retention.time' - resulting (hopefully) calibratated RT after MQ-recal (the X-axis of the plot) 'retention.time.calibration' - delta applied by MaxQuant 'rtdiff' - remaining RT diff to reference peptide of the same sequence 'RTdiff_in' - is the feature aligned (within 'match_tol')? 'fc.raw.file_ext' - raw file where each row represents one peptide whose RT was corrected by MaxQuant.
GGplot object
data = data.frame(fc.raw.file_ext = "file A", ## more than one would be possible calibrated.retention.time = c(20:100), retention.time.calibration = 6 + sin((20:100)/10)) data$rtdiff = rnorm(nrow(data)) data$RTdiff_in = c("green", "red")[1 + (abs(data$rtdiff) > 0.7)] plot_MBRAlign(data, c(-10, 10), "fancy subtitle", 0.7)
data = data.frame(fc.raw.file_ext = "file A", ## more than one would be possible calibrated.retention.time = c(20:100), retention.time.calibration = 6 + sin((20:100)/10)) data$rtdiff = rnorm(nrow(data)) data$RTdiff_in = c("green", "red")[1 + (abs(data$rtdiff) > 0.7)] plot_MBRAlign(data, c(-10, 10), "fancy subtitle", 0.7)
The plots shows the different categories of peak classes
plot_MBRgain(data, title_sub = "")
plot_MBRgain(data, title_sub = "")
data |
A data.frame with columns as described above |
title_sub |
Subtitle text |
The input is a data.frame with columns 'fc.raw.file' - raw file name 'single' - fraction of peptides with are represent only once 'multi.inRT' - fraction of peptides with are represent multiple times, but within a certain RT peak width 'multi.outRT' - fraction of peptides with are represent multiple times, with large RT distance 'sample' - raw file where each row represents one peptide sequence.
GGplot object
data = data.frame(fc.raw.file = paste("file", letters[1:4]), abs = c(5461, 5312, 3618, 502), pc = c(34, 32, 22, 2)) plot_MBRgain(data, "MBR gain: 18%")
data = data.frame(fc.raw.file = paste("file", letters[1:4]), abs = c(5461, 5312, 3618, 502), pc = c(34, 32, 22, 2)) plot_MBRgain(data, "MBR gain: 18%")
The plots shows the different categories of peak classes
plot_MBRIDtransfer(data)
plot_MBRIDtransfer(data)
data |
A data.frame with columns as described above |
The input is a data.frame with columns 'fc.raw.file' - raw file name 'single' - fraction of peptides with are represent only once 'multi.inRT' - fraction of peptides with are represent multiple times, but within a certain RT peak width 'multi.outRT' - fraction of peptides with are represent multiple times, with large RT distance 'sample' - raw file where each row represents one peptide sequence.
GGplot object
data = data.frame(fc.raw.file = rep(c("file A", "file B"), each = 3), single = c(0.9853628, 0.8323160, 0.9438375, 0.9825538, 0.8003763, 0.9329961), multi.inRT = c(0.002927445, 0.055101018, 0.017593087, 0.005636457, 0.099640044, 0.031870056), multi.outRT = c(0.01170978, 0.11258294, 0.03856946, 0.01180972, 0.09998363, 0.03513386), sample = rep(c("genuine", "transferred", "all"), 2)) plot_MBRIDtransfer(data)
data = data.frame(fc.raw.file = rep(c("file A", "file B"), each = 3), single = c(0.9853628, 0.8323160, 0.9438375, 0.9825538, 0.8003763, 0.9329961), multi.inRT = c(0.002927445, 0.055101018, 0.017593087, 0.005636457, 0.099640044, 0.031870056), multi.outRT = c(0.01170978, 0.11258294, 0.03856946, 0.01180972, 0.09998363, 0.03513386), sample = rep(c("genuine", "transferred", "all"), 2)) plot_MBRIDtransfer(data)
Per Raw file, an arbitrary number of missed cleavage classes (one per column) can be given. The total fraction of 3D-peaks must sum to 1 (=100 Columns are ordered by name.
plot_MissedCleavages(data, title_sub = "")
plot_MissedCleavages(data, title_sub = "")
data |
A data.frame with columns 'fc.raw.file', '...' (missed cleavage classes) |
title_sub |
Plot's subtitle |
A visual threshold line is drawn at 75
GGplot object
data = data.frame(fc.raw.file = letters[1:5], MC0 = c(0.8, 0.5, 0.85, 0.2, 0.9), MC1 = c(0.1, 0.4, 0.05, 0.7, 0.0), "MS2+" = c(0.1, 0.1, 0.1, 0.1, 0.1), check.names = FALSE) plot_MissedCleavages(data, "contaminant inclusion unknown")
data = data.frame(fc.raw.file = letters[1:5], MC0 = c(0.8, 0.5, 0.85, 0.2, 0.9), MC1 = c(0.1, 0.4, 0.05, 0.7, 0.0), "MS2+" = c(0.1, 0.1, 0.1, 0.1, 0.1), check.names = FALSE) plot_MissedCleavages(data, "contaminant inclusion unknown")
Per Raw file, at most three n's must be given, i.e. the fraction of 3D-peaks for n=1, n=2 and n=3(or more). The fractions must sum to 1 (=100
plot_MS2Decal(data)
plot_MS2Decal(data)
data |
A data.frame with columns 'file', 'msErr', 'type' |
GGplot object
n = c(100, 130, 50) data = data.frame(file = repEach(paste(letters[1:3],"\nLTQ [Da]"), n), msErr = c(rnorm(n[1], 0.5), rnorm(n[2], 0.0), rnorm(n[3], -0.5)), type = c("forward", "decoy")[1+(runif(sum(n))>0.95)]) plot_MS2Decal(data)
n = c(100, 130, 50) data = data.frame(file = repEach(paste(letters[1:3],"\nLTQ [Da]"), n), msErr = c(rnorm(n[1], 0.5), rnorm(n[2], 0.0), rnorm(n[3], -0.5)), type = c("forward", "decoy")[1+(runif(sum(n))>0.95)]) plot_MS2Decal(data)
Per Raw file, at most three n's must be given, i.e. the fraction of 3D-peaks for n=1, n=2 and n=3(or more). The fractions must sum to 1 (=100
plot_MS2Oversampling(data)
plot_MS2Oversampling(data)
data |
A data.frame with columns 'fc.raw.file', 'n', 'fraction' |
GGplot object
data = data.frame(fc.raw.file = rep(letters[1:3], each=3), n = 1:3, fraction = c(0.8, 0.1, 0.1, 0.6, 0.3, 0.1, 0.7, 0.25, 0.05)) plot_MS2Oversampling(data)
data = data.frame(fc.raw.file = rep(letters[1:3], each=3), n = 1:3, fraction = c(0.8, 0.1, 0.1, 0.6, 0.3, 0.1, 0.7, 0.25, 0.05)) plot_MS2Oversampling(data)
The input is a data.frame, as obtained from modsToTableByRaw().
plot_peptideMods(tbl, y_max = NA, show_missing_modification_levels = TRUE)
plot_peptideMods(tbl, y_max = NA, show_missing_modification_levels = TRUE)
tbl |
A data.frame with 'fc.raw.file', 'modification_names' (can be a factor), and 'Freq' (0-100) |
y_max |
The upper limit of the y-axis's (==Freq); useful for multiple plots with identical limits; if 'NA' the limit is computed from the given 'tbl' |
show_missing_modification_levels |
If 'tbl$modification_names' is a factor and has more (but missing) levels than actually used, should missing values be dropped or assumed as '0' frequency? |
GGplot object
data = data.frame(fc.raw.file = rep(c("file A", "file B"), each=3), modifications = c("Oxidation (M)", "Unmodified", "Oxidation (M), Acetyl (Protein N-term)", "2 Oxidation (M)", "Unmodified", "Unmodified")) tbl = modsToTableByRaw(data) plot_peptideMods(tbl,show_missing_modification_levels = TRUE)
data = data.frame(fc.raw.file = rep(c("file A", "file B"), each=3), modifications = c("Oxidation (M)", "Unmodified", "Oxidation (M), Acetyl (Protein N-term)", "2 Oxidation (M)", "Unmodified", "Unmodified")) tbl = modsToTableByRaw(data) plot_peptideMods(tbl,show_missing_modification_levels = TRUE)
The 'x' values are expected to be log2() transformed already.
plot_RatiosPG(df_ratios, d_range, main_title, main_col, legend_title)
plot_RatiosPG(df_ratios, d_range, main_title, main_col, legend_title)
df_ratios |
A data.frame with columns 'x', 'y', 'col', 'ltype' |
d_range |
X-axis range of plot |
main_title |
Plot title |
main_col |
Color of title |
legend_title |
Legend text |
GGplot object
x1 = seq(-3, 3, by = 0.1) y1 = dnorm(x1) x2 = seq(-5, 1, by = 0.1) y2 = dnorm(x2, mean = -1) data = data.frame( x = c(x1,x2), y = c(y1,y2), col = c(rep("ok", length(x1)), rep("shifted", length(x2))), ltype = c(rep("solid", length(x1)), rep("dotted", length(x2)))) plot_RatiosPG(data, range(data$x), "Ratio plot", "red", "group")
x1 = seq(-3, 3, by = 0.1) y1 = dnorm(x1) x2 = seq(-5, 1, by = 0.1) y2 = dnorm(x2, mean = -1) data = data.frame( x = c(x1,x2), y = c(y1,y2), col = c(rep("ok", length(x1)), rep("shifted", length(x2))), ltype = c(rep("solid", length(x1)), rep("dotted", length(x2)))) plot_RatiosPG(data, range(data$x), "Ratio plot", "red", "group")
The input is a data.frame with already averaged counts over binned RT-slices.
plot_RTPeakWidth(data, x_lim, y_lim)
plot_RTPeakWidth(data, x_lim, y_lim)
data |
A data.frame with columns 'fc.raw.file', 'RT', 'peakWidth' |
x_lim |
Plot range of x-axis |
y_lim |
Plot range of y-axis |
GGplot object
data = data.frame(fc.raw.file = rep(c("file A", "file B", "file C"), each=81), RT = c(20:100), peakWidth = c(rnorm(81, mean=20), rnorm(81, mean=10), rnorm(81, mean=30))) plot_RTPeakWidth(data, c(10, 100), c(0, 40))
data = data.frame(fc.raw.file = rep(c("file A", "file B", "file C"), each=81), RT = c(20:100), peakWidth = c(rnorm(81, mean=20), rnorm(81, mean=10), rnorm(81, mean=30))) plot_RTPeakWidth(data, c(10, 100), c(0, 40))
Number of Raw files must be 6 at most. Function will stop otherwise.
plot_ScanIDRate(data)
plot_ScanIDRate(data)
data |
A data.frame with columns 'fc.raw.file', 'scan.event.number', 'ratio', 'count' |
GGplot object
data = data.frame(fc.raw.file = factor(rep(c("d","a","x"), each=10), levels = c("d","a","x")), scan.event.number = 1:10, ratio = seq(40, 20, length.out=10), count = seq(400, 200, length.out=10)) plot_ScanIDRate(data)
data = data.frame(fc.raw.file = factor(rep(c("d","a","x"), each=10), levels = c("d","a","x")), scan.event.number = 1:10, ratio = seq(40, 20, length.out=10), count = seq(400, 200, length.out=10)) plot_ScanIDRate(data)
The input is a data.frame with already averaged counts over binned RT-slices.
plot_TIC(data, x_lim, y_lim)
plot_TIC(data, x_lim, y_lim)
data |
A data.frame with columns 'fc.raw.file', 'RT', 'intensity' |
x_lim |
Plot range of x-axis |
y_lim |
Plot range of y-axis |
GGplot object
data = data.frame(fc.raw.file = rep(c("file A", "file B", "file C"), each=81), RT = c(20:100), intensity = c(rnorm(81, mean=20), rnorm(81, mean=10), rnorm(81, mean=30))) plot_TIC(data, c(10, 100), c(0, 40))
data = data.frame(fc.raw.file = rep(c("file A", "file B", "file C"), each=81), RT = c(20:100), intensity = c(rnorm(81, mean=20), rnorm(81, mean=10), rnorm(81, mean=30))) plot_TIC(data, c(10, 100), c(0, 40))
Number of Raw files must be 6 at most. Function will stop otherwise.
plot_TopN(data)
plot_TopN(data)
data |
A data.frame with columns 'fc.raw.file', 'scan.event.number', 'n' |
GGplot object
data = data.frame(fc.raw.file = rep(c("d","a","x"), each=10), scan.event.number = 1:10, n = 11:20) plot_TopN(data)
data = data.frame(fc.raw.file = rep(c("d","a","x"), each=10), scan.event.number = 1:10, n = 11:20) plot_TopN(data)
Number of Raw files must be 6 at most. Function will stop otherwise.
plot_TopNoverRT(data)
plot_TopNoverRT(data)
data |
A data.frame with columns 'fc.raw.file', 'rRT', 'topN' |
GGplot object
data = data.frame(fc.raw.file = rep(letters[1:3], each=100), rRT = seq(20, 120, length.out = 100), topN = c(round(runif(100, min=3, max=5)), round(runif(100, min=5, max=8)), round(runif(100, min=1, max=3))) ) plot_TopNoverRT(data)
data = data.frame(fc.raw.file = rep(letters[1:3], each=100), rRT = seq(20, 120, length.out = 100), topN = c(round(runif(100, min=3, max=5)), round(runif(100, min=5, max=8)), round(runif(100, min=1, max=3))) ) plot_TopNoverRT(data)
Boxes are optionally colored to indicate that a MQ bug was detected or if PTXQC detected a too narrow search window.
plot_UncalibratedMSErr( data, MQBug_raw_files, stats, y_lim, extra_limit, title_sub )
plot_UncalibratedMSErr( data, MQBug_raw_files, stats, y_lim, extra_limit, title_sub )
data |
A data.frame with columns 'fc.raw.file', 'uncalibrated.mass.error..ppm.' |
MQBug_raw_files |
List of Raw files with invalid calibration values |
stats |
A data.frame with columns 'fc.raw.file', 'sd', 'outOfCal' |
y_lim |
Range of y-axis |
extra_limit |
Position where a v-line is plotted (for visual guidance) |
title_sub |
Subtitle |
GGplot object
n = c(150, 1000, 1000, 1000) data = data.frame(fc.raw.file = repEach(letters[4:1], n), uncalibrated.mass.error..ppm. = c(rnorm(n[1], 13, 2.4), rnorm(n[2], 1, 0.5), rnorm(n[3], 3, 0.7), rnorm(n[4], 4.5, 0.8))) stats = data.frame(fc.raw.file = letters[4:1], sd_uncal = c(2.4, 0.5, 0.7, 0.8), outOfCal = c(TRUE, FALSE, FALSE, FALSE)) plot_UncalibratedMSErr(data, MQBug_raw_files = letters[1], stats, y_lim = c(-20,20), 15, "subtitle")
n = c(150, 1000, 1000, 1000) data = data.frame(fc.raw.file = repEach(letters[4:1], n), uncalibrated.mass.error..ppm. = c(rnorm(n[1], 13, 2.4), rnorm(n[2], 1, 0.5), rnorm(n[3], 3, 0.7), rnorm(n[4], 4.5, 0.8))) stats = data.frame(fc.raw.file = letters[4:1], sd_uncal = c(2.4, 0.5, 0.7, 0.8), outOfCal = c(TRUE, FALSE, FALSE, FALSE)) plot_UncalibratedMSErr(data, MQBug_raw_files = letters[1], stats, y_lim = c(-20,20), 15, "subtitle")
Restriction: currently, the footer will be cropped at the table width.
plotTable( data, title = "", footer = "", col_names = colnames(data), fill = c("grey90", "grey70"), col = "black", just = "centre" )
plotTable( data, title = "", footer = "", col_names = colnames(data), fill = c("grey90", "grey70"), col = "black", just = "centre" )
data |
A data.frame with columns as described above |
title |
Table title |
footer |
Footer text |
col_names |
Column names for Table |
fill |
Fill pattern (by row) |
col |
Text color (by column) |
just |
(ignored) |
gTree object with class 'PTXQC_table'
data = data.frame(raw.file = letters[1:4], id.rate = 3:6) plotTable(data, title = "Bad files", footer = "bottom", col_names = c("first col", "second col"), col=c("red", "green"))
data = data.frame(raw.file = letters[1:4], id.rate = 3:6) plotTable(data, title = "Bad files", footer = "bottom", col_names = c("first col", "second col"), col=c("red", "green"))
Code taken from http://stackoverflow.com/questions/23819209/change-text-color-for-cells-using-tablegrob-in-r
plotTableRaw(data, colours = "black", fill = NA, just = "centre")
plotTableRaw(data, colours = "black", fill = NA, just = "centre")
data |
Table as Data.frame |
colours |
Single or set of colours (col-wise) |
fill |
Cell fill (row-wise) |
just |
(ignored) |
gTable
#' Usage: ggplot(...) + geom_X(...) + pointsPutX(...)
pointsPutX(x_range, x_section, y, col = NA)
pointsPutX(x_range, x_section, y, col = NA)
x_range |
[min,max] valid range of x-values |
x_section |
[min,max] fraction in which to distribute the values (in [0,1] for min,max, e.g. c(0.03,0.08) for 3-8%) |
y |
Y-values |
col |
Colour of the points (used as argument to aes(colour=)) |
ggplot object with new geom_point
helper S3 class, enabling print(some-plot_Table-object)
## S3 method for class 'PTXQC_table' print(x, ...)
## S3 method for class 'PTXQC_table' print(x, ...)
x |
Some Grid object to plot |
... |
Further arguments (not used, but required for consistency with other print methods) |
Define a Singleton class which holds the full raw filenames (+path) and their PSI-MS CV terms for usage in the mzQC metadata
Define a Singleton class which holds the full raw filenames (+path) and their PSI-MS CV terms for usage in the mzQC metadata
R6P::Singleton
-> QCMetaFilenames
data
Stores the data of the singleton. Set the data once before using the singleton all over the place
clone()
The objects of this class are cloneable with this method.
QCMetaFilenames$clone(deep = FALSE)
deep
Whether to make a deep clone.
Metric for msmsscans.txt, showing TopN over RT.
Reference class which is instanciated with a metric description and a worker function (at initialization time, i.e. in the package) and can produce plots and mzQC values (at runtime, when data is provided) using setData().
helpText
Description (lengthy) of the metric and plot elements
workerFcn
Function which generates a result (usually plots). Data is provided using setData().
plots
List of plots (after setData() was called)
qcScores
Data.frame of scores from a qcMetric (computed within workerFcn())
mzQC
An named list of mzQC MzQCqualityMetric's (named by their fc.raw.file for runQuality or concatenated fc.raw.files for setQualities (e.g. "file 1;file4")) (valid after setData() was called)
qcCat
QC category (LC, MS, or prep)
qcName
Name of the qcScore in the heatmap
orderNr
Column index during heatmap generation and for the general order of plots
require(ggplot2) dd = data.frame(x=1:10, y=11:20) a = qcMetric$new(helpText="small help text", ## arbitrary arguments, matched during setData() workerFcn=function(.self, data, gtit) { ## usually some code here to produce ggplots pl = lapply(1:2, function(xx) { ggplot(data) + geom_point(aes(x=x*xx,y=y)) + ggtitle(gtit) }) return(list(plots = pl)) }, qcCat="LC", qcName="MS/MS Peak shape", orderNr = 30) ## test some output a$setData(dd, "my title") a$plots ## the raw plots a$getPlots(TRUE) ## same as above a$getPlots(FALSE) ## plots without title a$getTitles() ## get the titles of the all plots a$helpText a$qcName
require(ggplot2) dd = data.frame(x=1:10, y=11:20) a = qcMetric$new(helpText="small help text", ## arbitrary arguments, matched during setData() workerFcn=function(.self, data, gtit) { ## usually some code here to produce ggplots pl = lapply(1:2, function(xx) { ggplot(data) + geom_point(aes(x=x*xx,y=y)) + ggtitle(gtit) }) return(list(plots = pl)) }, qcCat="LC", qcName="MS/MS Peak shape", orderNr = 30) ## test some output a$setData(dd, "my title") a$plots ## the raw plots a$getPlots(TRUE) ## same as above a$getPlots(FALSE) ## plots without title a$getTitles() ## get the titles of the all plots a$helpText a$qcName
... and report the row of the matrix which has maximum sum (i.e the best "reference" distribution). The returned data.frame has as many rows as distributions given and two columns. The first column 'name' gives the name of the list element, the second column 'ks_best' gives '1-statistic' of the Kolmogorov-Smirnoff test to the "reference" distribution (which was picked by maximising the sum of 'ks_best'). Thus, the row with a 'ks_best' of 1 is the reference distribution.
qualBestKS(x)
qualBestKS(x)
x |
List of vectors, where each vector holds a distribution |
A data.frame with ks-test values of the "reference" to all other distributions (see Details)
Ranges between 0 (worst score) and 1 (best score). A median of zero gives the best score of 1. The closer the median is to the most extreme value of the distribution, the smaller the score (until reaching 0). Can be used for calibrated mass errors, as a measure of how well they are centered around 0. E.g. if the median is 0.1, while the range is [-0.5,0.5], the score will be 0.8 (punishing the 20 If the range of data is asymmetric, e.g. [-1.5,-0.5] and does not include zero, the score cannot reach 1, since the median can never be zero.
qualCentered(x)
qualCentered(x)
x |
Numeric values (e.g. ppm errors) |
Value between [0, 1]
Ranges between 0 (worst score) and 1 (best score). The best score is achieved when the median of 'x' is close to the center of the interval [-tol, tol]. If median of 'x' is close to the border (on either side), the score decreases linearly to zero. Can be used for uncalibrated mass errors, as a measure of how well they are centered around 0.
qualCenteredRef(x, tol)
qualCenteredRef(x, tol)
x |
Vector of values (hopefully in interval [-tol, tol]) |
tol |
Border of interval (must be positive) |
NA's are removed for all computations.
Value between [0, 1]
Measure for centeredness around 0. Highest score is 1, worst score is 0.
qualGaussDev(mu, sd)
qualGaussDev(mu, sd)
mu |
Center of Gaussian |
sd |
SD of Gaussian |
quality, ranging from 0 (bad agreement) to 1 (perfect, i.e. centered at 0)
The score is computed according to
qualHighest(x, N)
qualHighest(x, N)
x |
Vector of numeric values (e.g. height of histogram bins) |
N |
Length of x (just a precaution currently) |
q = ((N-1) - sum_i( ((N-i-1)*x_i) ) / (N-1)
Scores range from 0 (worst), to 1 (best). E.g. c(0,0,0,16) would yield a score of 1. c(16,0,0,0,0) gives a score of 0.
Quality score in the range of [0,1]
qualHighest(c(0,0,0,16), 4) ## 1 qualHighest(c(16,0,0,0), 4) ## 0 qualHighest(c(1,1,1,1), 4) ## 0.5 qualHighest(c(0,16,0,0), 4) ## 1/3
qualHighest(c(0,0,0,16), 4) ## 1 qualHighest(c(16,0,0,0), 4) ## 0 qualHighest(c(1,1,1,1), 4) ## 0.5 qualHighest(c(0,16,0,0), 4) ## 1/3
Ranges between 0 (worst score) and 1 (best score). Useful for performance measures where reaching a certain reference threshold 't' will be enough to reach 100%. The input range from [0, t] is scored from 0-100%.
qualLinThresh(x, t = 1)
qualLinThresh(x, t = 1)
x |
Numeric value(s) between [0, inf] |
t |
Threshold value, which indicates 100% |
Value between [0, 1]
Ranges between 0 (worst score) and 1 (best score). Input must be between [0,1]. Deviations from the median of the sample represent the score for each sample point.
qualMedianDist(x)
qualMedianDist(x)
x |
A vector numeric values between [0,1] |
A vector of the same size as x, with quality values between [0, 1]
The score ranges between 0 (worst score) and 1 (best score). Input 'x' is a vector of counts (or probabilities) for equally spaced bins in a histogram. A uniform distribution (e.g. c(3,3,3) will get a score of 1. The worst possible case (e.g. c(4,0,0)), will get a score of 0, and a linear increasing function (e.g. c(1,2,3)) will get something in between (0.585 here)
qualUniform(x, weight = vector())
qualUniform(x, weight = vector())
x |
Vector of numeric intensity/count values (e.g. ID's per RT bin); bins are assumed to have equal widths |
weight |
Vector of weights for values in 'x' (same length as 'x'). |
In addition, bin values can be weighted (e.g. by their confidence). The total sum of weights is normalized to 1 internally.
The distance function used is the square root of the absolute difference between a uniform distribution and the input 'x' (summed for each element of 'x'). This distance is normalized to the worst possible input (e.g. one bin with 100
Value between [0, 1]
stopifnot(qualUniform(c(3,3,3))==1) stopifnot(qualUniform(c(4,0,0))==0) ## how 'uniform' is a vector where only a single index has weight?-- answer: very stopifnot(qualUniform(c(4,0,0), c(1,0,0))==1) stopifnot(qualUniform(c(4,0,0), c(0,1,0))==1) stopifnot(qualUniform(c(0,4,0))==0) stopifnot(abs(qualUniform(c(3,2,1))-0.58578) < 0.0001) stopifnot(abs(qualUniform(c(1,2,3))-0.58578) < 0.0001) stopifnot(qualUniform(c(1,2,3), c(0,1,0))==1) stopifnot(abs(qualUniform(c(1,2,3))-0.58578) < 0.0001) stopifnot(abs(qualUniform(c(1,2,3), c(0,1,1))- 0.590316) < 0.0001) stopifnot(abs(qualUniform(c(2,3), c(1,1))-0.552786) < 0.0001) stopifnot(abs(qualUniform(1:120)-0.38661) < 0.0001)
stopifnot(qualUniform(c(3,3,3))==1) stopifnot(qualUniform(c(4,0,0))==0) ## how 'uniform' is a vector where only a single index has weight?-- answer: very stopifnot(qualUniform(c(4,0,0), c(1,0,0))==1) stopifnot(qualUniform(c(4,0,0), c(0,1,0))==1) stopifnot(qualUniform(c(0,4,0))==0) stopifnot(abs(qualUniform(c(3,2,1))-0.58578) < 0.0001) stopifnot(abs(qualUniform(c(1,2,3))-0.58578) < 0.0001) stopifnot(qualUniform(c(1,2,3), c(0,1,0))==1) stopifnot(abs(qualUniform(c(1,2,3))-0.58578) < 0.0001) stopifnot(abs(qualUniform(c(1,2,3), c(0,1,1))- 0.590316) < 0.0001) stopifnot(abs(qualUniform(c(2,3), c(1,1))-0.552786) < 0.0001) stopifnot(abs(qualUniform(1:120)-0.38661) < 0.0001)
For params, see MQDataReader::readMQ()
.
read.MQ( file, filter = "", type = "pg", col_subset = NA, add_fs_col = 10, LFQ_action = FALSE, ... )
read.MQ( file, filter = "", type = "pg", col_subset = NA, add_fs_col = 10, LFQ_action = FALSE, ... )
file |
see |
filter |
see |
type |
see |
col_subset |
see |
add_fs_col |
see |
LFQ_action |
see |
... |
see |
see MQDataReader::readMQ()
Given a vector of (short/long) filenames, translate to the (long/short) version
renameFile(f_names, mapping)
renameFile(f_names, mapping)
f_names |
Vector of filenames |
mapping |
A data.frame with from,to columns |
A vector of translated file names as factor (ordered by mapping!)
Repeat each element x_i in X, n_i times.
repEach(x, n)
repEach(x, n)
x |
Values to be repeated |
n |
Number of repeat for each x_i (same length as x) |
Vector with values from x, n times
repEach(1:3, 1:3) ## 1, 2, 2, 3, 3, 3
repEach(1:3, 1:3) ## 1, 2, 2, 3, 3, 3
Simply CV
*100
RSD(x)
RSD(x)
x |
Vector of numeric values |
RSD
This allows the user to judge which Raw files have similar corrected RT's (i.e. where aligned successfully). If there are clear sub-clusters, it might be worth introducing artifical fractions into MaxQuant, to avoid ID-transfer between these clusters (use the MBR-Align and MBR-ID-Transfer metrics to support the decision).
RTalignmentTree(df_evd, col_fraction = c())
RTalignmentTree(df_evd, col_fraction = c())
df_evd |
Evidence table containing calibrated retention times and sequence information. |
col_fraction |
Empty vector or 1-values vector giving the name of the fraction column (if existing) |
If the input contains fractions, leaf nodes will be colored accordingly. Distinct sub-clusters should have their own color. If not, MaxQuant's fraction settings should be optimized. Note that introducing fractions in MaxQuant will naturally lead to a clustering here (it's somewhat circular).
ggplot object containing the correlation tree
Inverse the order of items on the x-axis (for discrete scales)
scale_x_discrete_reverse(values, ...)
scale_x_discrete_reverse(values, ...)
values |
The vector of values as given to the x aestetic |
... |
Other arguments forwarded to 'scale_y_discrete()' |
ggplot object, concatenatable with '+'
Inverse the order of items on the y-axis (for discrete scales)
scale_y_discrete_reverse(values, ...)
scale_y_discrete_reverse(values, ...)
values |
The vector of values as given to the y aestetic |
... |
Other arguments forwarded to 'scale_y_discrete()' |
ggplot object, concatenatable with '+'
Scales a vector of values linearly to [0, 1] If all input values are equal, returned values are all 0
scale01linear(X)
scale01linear(X)
X |
Vector of values |
Scaled vector
Using the result from 'alignmentCheck()', score the features of every Raw file and see if they have been properly aligned. Returned value is between 0 (bad) and 1 (all aligned).
ScoreInAlignWindow(data, allowed.deltaRT = 1)
ScoreInAlignWindow(data, allowed.deltaRT = 1)
data |
A data.frame with columns 'rtdiff' and 'raw.file' |
allowed.deltaRT |
The allowed matching difference (1 minute by default) |
A data.frame with one row for each raw.file and columns 'raw.file' and 'withinRT' (0-1)
Some axis labels are sometimes just too long and printing them will either squeeze the actual plot (ggplot) or make the labels disappear beyond the margins (graphics::plot) One ad-hoc way of avoiding this is to shorten the names, hoping they are still meaningful to the viewer.
shortenStrings(x, max_len = 20, verbose = TRUE, allow_duplicates = FALSE)
shortenStrings(x, max_len = 20, verbose = TRUE, allow_duplicates = FALSE)
x |
Vector of input strings |
max_len |
Maximum length allowed |
verbose |
Print which strings were shortened |
allow_duplicates |
If shortened strings are not discernible any longer, consider the short version valid (not the default), otherwise (default) return the full string (–> no-op) |
This function should be applied AFTER you tried more gentle methods, such as delLCP
or simplifyNames
.
A vector of shortened strings
r = shortenStrings(c("gamg_101", "gamg_101230100451", "jurkat_06_100731121305", "jurkat_06_1")) all(r == c("gamg_101", "gamg_101230100..", "jurkat_06_1007..", "jurkat_06_1"))
r = shortenStrings(c("gamg_101", "gamg_101230100451", "jurkat_06_100731121305", "jurkat_06_1")) all(r == c("gamg_101", "gamg_101230100..", "jurkat_06_1007..", "jurkat_06_1"))
Usually handy for plots, where condition names should be as concise as possible. E.g. you do not want names like 'TK20130501_H2M1_010_IMU008_CISPLA_E3_R1.raw' and 'TK20130501_H2M1_026_IMU008_CISPLA_E7_R2.raw' but rather 'TK.._010_I.._E3_R1.raw' and 'TK.._026_I.._E7_R2.raw'
If multiple such substrings exist, the algorithm will remove the longest first and iterate a number of times (two by default) to find the second/third etc longest common substring. Each substring must fulfill a minimum length requirement - if its shorter, its not considered worth removing and the iteration is aborted.
simplifyNames( strings, infix_iterations = 2, min_LCS_length = 7, min_out_length = 7 )
simplifyNames( strings, infix_iterations = 2, min_LCS_length = 7, min_out_length = 7 )
strings |
A vector of strings which are to be shortened |
infix_iterations |
Number of successive rounds of substring removal |
min_LCS_length |
Minimum length of the longest common substring (default:7, minimum: 6) |
min_out_length |
Minimum length of shortest element of output (no shortening will be done which causes output to be shorter than this threshold) |
A list of shortened strings, with the same length as the input
#library(PTXQC) simplifyNames(c('TK20130501_H2M1_010_IMU008_CISPLA_E3_R1.raw', 'TK20130501_H2M1_026_IMU008_CISPLA_E7_R2.raw'), infix_iterations = 2) # --> "TK.._010_I.._E3_R1.raw","TK.._026_I.._E7_R2.raw" try(simplifyNames(c("bla", "foo"), min_LCS_length=5)) # --> error, since min_LCS_length must be >=6
#library(PTXQC) simplifyNames(c('TK20130501_H2M1_010_IMU008_CISPLA_E3_R1.raw', 'TK20130501_H2M1_026_IMU008_CISPLA_E7_R2.raw'), infix_iterations = 2) # --> "TK.._010_I.._E3_R1.raw","TK.._026_I.._E7_R2.raw" try(simplifyNames(c("bla", "foo"), min_LCS_length=5)) # --> error, since min_LCS_length must be >=6
If there is no unique prefix (e.g. if a string is contained twice), then the length of the longest string is returned, i.e. if the return value is used in a call to substr, nothing happens e.g. substr(x, 1, supCount(x)) == x
supCount(x, prefix_l = 1)
supCount(x, prefix_l = 1)
x |
Vector of strings |
prefix_l |
Starting prefix length, which is incremented in steps of 1 until all prefixes are unique (or maximum string length is reached) |
Integer with minimal prefix length required
supCount(c("abcde...", "abcd...", "abc...")) ## 5 x = c("doubled", "doubled", "aLongDummyString") all( substr(x, 1, supCount(x)) == x ) ## TRUE (no unique prefix due to duplicated entries)
supCount(c("abcde...", "abcd...", "abc...")) ## 5 x = c("doubled", "doubled", "aLongDummyString") all( substr(x, 1, supCount(x)) == x ) ## TRUE (no unique prefix due to duplicated entries)
A blank theme (similar to the deprecated theme_blank())
theme_blank()
theme_blank()
A ggplot2 object, representing an empty theme
All values in the numerical column 'filterColname' are assigned to bins of width 'binsize'. Only one value per bin is retained. All other rows are removed and the reduced data frame will all its columns is returned.
thinOut(data, filterColname, binsize)
thinOut(data, filterColname, binsize)
data |
The data.frame to be filtered |
filterColname |
Name of the filter column as string |
binsize |
Width of a bin |
Data.frame with reduced rows, but identical input columns
The binsize is computed from the global data range of the filter column by dividing the range into binCount bins.
thinOutBatch(data, filterColname, batchColname, binCount = 1000)
thinOutBatch(data, filterColname, batchColname, binCount = 1000)
data |
The data.frame to be split and filtered(thinned) |
filterColname |
Name of the filter column as string |
batchColname |
Name of the split column as string |
binCount |
Number of bins in the 'filterColname' dimension. |
Data.frame with reduced rows, but identical input columns
This functions gives the user a chance to make the output file writeable before a write attempt is actually made by R to avoid having run the whole program again upon write failure.
wait_for_writable( filename, prompt_text = paste0("The file '", filename, "' is not writable. Please close all applications using this file. Press '", abort_answer, "' to abort!"), abort_answer = "n" )
wait_for_writable( filename, prompt_text = paste0("The file '", filename, "' is not writable. Please close all applications using this file. Press '", abort_answer, "' to abort!"), abort_answer = "n" )
filename |
The file to test for writable |
prompt_text |
If not writable, show this prompt text to the user |
abort_answer |
If the user enters this string into the prompt, this function will stop() |
Note: The file will not be overwritten or changed by this function.
TRUE if writable, FALSE if aborted by user or (not-writeable and non-interactive)
If the object has the param, then return it. If the param is unknown, create it with the given default value and return the default.
yamlObj
A Yaml object as created by yaml.load
getYAML(param_name, default, min = NA, max = NA)
Query this YAML object for a certain parameter and return its value. If it does not exist it is created with a default value. An optional min/max range can be specified and will be enforced if the value is known (default will be used upon violation).
setYAML(param_name, value)
Set a YAML parameter to a certain value. Overwrites the old value or creates a new entry if hithero unknown.
writeYAML(filename)
Write YAML config (including some documentation) to a YAML file. Returns TRUE on success (always), unless writing the file generates an error.
yc = YAMLClass$new(list()) val = yc$getYAML("cat$subCat", "someDefault") val ## someDefault val = yc$setYAML("cat$subCat", "someValue") val ## someValue yc$getYAML("cat$subCat", "someDefault") ## still 'someValue' (since its set already)
yc = YAMLClass$new(list()) val = yc$getYAML("cat$subCat", "someDefault") val ## someDefault val = yc$setYAML("cat$subCat", "someValue") val ## someValue yc$getYAML("cat$subCat", "someDefault") ## still 'someValue' (since its set already)