Prediction Output

Inside the predict output directory (-o/--outdir) you'll find a collection of files and directories. We outline the files that are likely to be of interest to most users.

Prediction JSON

This is <sample>.drprg.json. The value given to the -s/--sample option dictates the name - i.e. <sample>.drprg.json.

This is the only file most users will want/need to interact with. It contains the resistance prediction for each drug in the index's catalogue.

Example

This is a trimmmed (toy) example JSON output for a sample

{
  "genes": {
    "absent": [
      "ahpC"
    ],
    "present": [
      "embA",
      "embB",
      "ethA",
      "fabG1",
      "gid",
      "gyrA",
      "gyrB",
      "inhA",
      "katG",
      "pncA",
      "rpoB",
      "rrs"
    ]
  },
  "sample": "toy",
  "susceptibility": {
    "Amikacin": {
      "evidence": [
        {
          "gene": "rrs",
          "residue": "DNA",
          "variant": "A1401X",
          "vcfid": "b815ed3f"
        }
      ],
      "predict": "F"
    },
    "Ethambutol": {
      "evidence": [
        {
          "gene": "embB",
          "residue": "PROT",
          "variant": "M306I",
          "vcfid": "a290b118"
        }
      ],
      "predict": "r"
    },
    "Ethionamide": {
      "evidence": [
        {
          "gene": "ethA",
          "residue": "PROT",
          "variant": "A381P",
          "vcfid": "169f75d4"
        }
      ],
      "predict": "U"
    },
    "Isoniazid": {
      "evidence": [
        {
          "gene": "fabG1",
          "residue": "DNA",
          "variant": "G-17T",
          "vcfid": "de9b689e"
        },
        {
          "gene": "katG",
          "residue": "PROT",
          "variant": "S315T",
          "vcfid": "acaa8ca2"
        }
      ],
      "predict": "R"
    },
    "Levofloxacin": {
      "evidence": [],
      "predict": "S"
    }
  },
  "version": {
    "drprg": "0.1.1",
    "index": "20230308"
  }
}

The keys of the JSON are

  • genes: This contains a list of genes in the index reference graph which are present and absent
  • sample: The value passed to the -s/--sample option
  • susceptibility: The keys of this entry are the drugs in the index catalogue. Each drug's entry contains evidence supporting the value in the predict section.

Predict

The predict entry for a drug is the resistance prediction for the sample. Possible values are

  • S: susceptible. This is the "default" prediction. If no mutations are detected for the sample, it is assumed to be susceptible
  • F: failed. Genotyping failed for one or more mutations for this drug. See the prediction VCF for more information
  • U: unknown. One or more mutations that are not present in the index catalogue were detected in a gene associated with this drug
  • R: resistant. One or more mutations from the index catalogue that confer resistance were detected
  • u or r: The same as the uppercase versions, but the mutation(s) were detected in a minor allele.

Evidence

This is a list of the mutations supporting the prediction. The residue is one of DNA or PROT indicating whether the mutation describes a nucleotide or amino acid change, respectively.

The variant is of the form <ref><pos><alt>; where <ref> is the reference sequence at position <pos> and <alt> is the nucleotide/amino acid the reference is changed to. See the catalogue docs for more information.

The vcfid is the value in the VCF ID column for this mutation, making it easier to find a mutation in the VCF.

Prediction VCF

This is <sample>.drprg.bcf. As this file is a BCF, you will need to use bcftools to view it - e.g. bcftools view sample.drprg.bcf.

You should only need to interact with this file if you want further information about the exact evidence supporting a mutation being called, why a mutation was called as failed (F), or why a mutation wasn't called. For those mutations in the JSON, you can easily look them up using the vcfid, which can be found in the ID (third) column of the BCF. Or you can just use grep. For example, to look up the Isoniazid S315T mutation in katG from the example you can use

$ bcftools view sample.drprg.bcf | grep acaa8ca2
katG    1044    acaa8ca2        GC      AC,CA,CC        .       PASS    VC=PH_SNPs;GRAPHTYPE=SIMPLE;PDP=0,0.0123457,0,0.987654;VARID=katG_S315T;PREDICT=R       GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF      3:0.1.1,42:0,0,0,38:0.1.1,42:0,0,0,38:0.1.1,127:0,0,0,116:1,1,1,0:-523.019,-514.096,-523.019,-7.87925:506.217

All INFO and FORMAT fields are defined in the header of the BCF file. We recommend reading the VCF/BCF specifications for help with how to interpret data in a VCF file.