Prediction Output
Inside the predict
output directory (-o/--outdir
) you'll find a collection of files
and directories. We outline the files that are likely to be of interest to most users.
Prediction JSON
This is <sample>.drprg.json
. The value given to the -s/--sample
option dictates the name - i.e. <sample>.drprg.json
.
This is the only file most users will want/need to interact with. It contains the resistance prediction for each drug in the index's catalogue.
Example
This is a trimmmed (toy) example JSON output for a sample
{
"genes": {
"absent": [
"ahpC"
],
"present": [
"embA",
"embB",
"ethA",
"fabG1",
"gid",
"gyrA",
"gyrB",
"inhA",
"katG",
"pncA",
"rpoB",
"rrs"
]
},
"sample": "toy",
"susceptibility": {
"Amikacin": {
"evidence": [
{
"gene": "rrs",
"residue": "DNA",
"variant": "A1401X",
"vcfid": "b815ed3f"
}
],
"predict": "F"
},
"Ethambutol": {
"evidence": [
{
"gene": "embB",
"residue": "PROT",
"variant": "M306I",
"vcfid": "a290b118"
}
],
"predict": "r"
},
"Ethionamide": {
"evidence": [
{
"gene": "ethA",
"residue": "PROT",
"variant": "A381P",
"vcfid": "169f75d4"
}
],
"predict": "U"
},
"Isoniazid": {
"evidence": [
{
"gene": "fabG1",
"residue": "DNA",
"variant": "G-17T",
"vcfid": "de9b689e"
},
{
"gene": "katG",
"residue": "PROT",
"variant": "S315T",
"vcfid": "acaa8ca2"
}
],
"predict": "R"
},
"Levofloxacin": {
"evidence": [],
"predict": "S"
}
},
"version": {
"drprg": "0.1.1",
"index": "20230308"
}
}
The keys of the JSON are
genes
: This contains a list of genes in the index reference graph which are present and absentsample
: The value passed to the-s/--sample
optionsusceptibility
: The keys of this entry are the drugs in the index catalogue. Each drug's entry containsevidence
supporting the value in thepredict
section.
Predict
The predict
entry for a drug is the resistance prediction for the sample. Possible
values are
S
: susceptible. This is the "default" prediction. If no mutations are detected for the sample, it is assumed to be susceptibleF
: failed. Genotyping failed for one or more mutations for this drug. See the prediction VCF for more informationU
: unknown. One or more mutations that are not present in the index catalogue were detected in a gene associated with this drugR
: resistant. One or more mutations from the index catalogue that confer resistance were detectedu
orr
: The same as the uppercase versions, but the mutation(s) were detected in a minor allele.
Evidence
This is a list of the mutations supporting the prediction. The residue
is one of DNA
or PROT
indicating whether the mutation describes a nucleotide or amino acid change,
respectively.
The variant
is of the form <ref><pos><alt>
; where <ref>
is the reference sequence
at position <pos>
and <alt>
is the nucleotide/amino acid the reference is changed
to. See the catalogue docs for more information.
The vcfid
is the value in the VCF ID
column for this mutation,
making it easier to find a mutation in the VCF.
Prediction VCF
This is <sample>.drprg.bcf
. As this file is a BCF, you will need to
use bcftools
to view it - e.g. bcftools view sample.drprg.bcf
.
You should only need to interact with this file if you want further information about
the exact evidence supporting a mutation being called, why a mutation was called as
failed (F
), or why a mutation wasn't called. For those mutations in the JSON, you
can easily look them up using the vcfid
, which can be found in the ID
(third)
column of the BCF. Or you can just use grep
. For example, to look up the
Isoniazid S315T mutation
in katG from the example you can use
$ bcftools view sample.drprg.bcf | grep acaa8ca2
katG 1044 acaa8ca2 GC AC,CA,CC . PASS VC=PH_SNPs;GRAPHTYPE=SIMPLE;PDP=0,0.0123457,0,0.987654;VARID=katG_S315T;PREDICT=R GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF 3:0.1.1,42:0,0,0,38:0.1.1,42:0,0,0,38:0.1.1,127:0,0,0,116:1,1,1,0:-523.019,-514.096,-523.019,-7.87925:506.217
All INFO
and FORMAT
fields are defined in the header of the BCF file. We recommend
reading the VCF/BCF specifications for help with how to interpret data in a VCF
file.