Run polyOrigin

Run function polyOrigin for haplotype reconstruction. Here is a simulated example of a tetraploid population, consists of four subpopulations: three crosses among 3 parents and one selfing of the first parent. The genofile contains genetic map with 120 SNPs and dosage data for 3 parents and 120 offspring.

using PolyOrigin
workdir = joinpath(pkgdir(PolyOrigin),"docs","run_polyOrigin")
genofile = "geno_disturbed.csv"
pedfile = "ped.csv"
outstem = "outstem"
polyancestry = polyOrigin(genofile,pedfile;
    chrpairing=44,
    isphysmap = false,
    refinemap=true,
    refineorder=true,
    workdir,
    outstem,
)
filter(x->occursin(outstem,x), readdir(workdir))
8-element Vector{String}:
 "outstem.log"
 "outstem_genoprob.csv"
 "outstem_maprefined.csv"
 "outstem_parentphased.csv"
 "outstem_parentphased_corrected.csv"
 "outstem_polyancestry.csv"
 "outstem_postdoseprob.csv"
 "outstem_valentsummary.csv"

Here chrpairing=44 denotes bivalent and quadrivalent formations in ancestral inference. isphysmap specifies if the input map is a physical map, refinemap specifies if refine the input map and refineorder specifies further if refine only inter-marker distances (refinemap=true and refineorder=false), workdir specifies the directory of input and output files, and outstem specifies the filename stem for output files.

The polyOrigin function produces the following outputfiles:

  • outstem.log: log file saves messages that are printed on console.
  • outstem_maprefined.csv: same as the input genofile, except that input marker map is refined.
  • outstem_parentphased.csv: same as the input genofile, except that parental genotypes are phased.
  • outstem_parentphased_corrected.csv: the phased parent genotypes are corrected.
  • outstem_polyancestry.csv: saves the returned variable polyancestry. See also polyOrigin for more details.
  • outstem_genoprob.csv: a concise version of the above file, including genetic map, phased parental genotypes, and posterior origin-genotype probabilities.
  • outstem_postdoseprob.csv: same as the input genofile, except that parent genotypes are phased and offspring genotypes are given by the posterior dose probabilities.

Here outstem_maprefined.csv, outstem_parentphased.csv, outstem_parentphased_corrected.csv, or outstem_postdoseprob.csv may be iteratively used as the input genofile.

Visualize conditional probability

The outstem_polyancestry.csv can be read back by the function readPolyAncestry, and ancestral conditional probability can be visualized by plotCondprob or animCondprob.

polyancestry = readPolyAncestry("outstem_polyancestry.csv",workdir=workdir)
truegeno = readTruegeno!("true.csv",polyancestry,workdir=workdir)
using Plots
animCondprob(polyancestry; truegeno=truegeno,
    left_margin = 8Plots.mm, bottom_margin = 5Plots.mm)

Evaluate estimated map

Compare input map and estimated map with true map

polygeno=readPolyGeno(genofile,pedfile,workdir=workdir)
fig = plotMapComp(truegeno.truemap,polygeno.markermap,
    xlabel="True position (cM)",
    ylabel="Input position (cM)")
fig2 = plotMapComp(truegeno.truemap, polyancestry.markermap,
    xlabel="True position (cM)",
    ylabel="Estimated position (cM)")
using Plots
plot(fig,fig2,left_margin = 8Plots.mm, bottom_margin = 5Plots.mm)

where tau is the Kendall rank correlation between true map and comparing map.