Skip to content

ReportFpgOutputForObservationalModel

FPG simulations have complete knowledge of every parasite genome in the population, but real-world genomic surveillance collects data through specific sampling strategies that capture only a fraction of infections. ReportFpgOutputForObservationalModel bridges this gap by extracting the complete genetic data on all filtered infected individuals, allowing post-processing tools such as the FPGObservationalModel to apply realistic surveillance sampling strategies and study what genetic signals different approaches can detect.

Unlike most EMOD reports, which produce a single output file named after the report class, this report produces a three-file ensemble: infIndexRecursive-genomes-df.csv, variants.npy, and roots.npy. This report is intended for simulations where Malaria_Model is set to MALARIA_MECHANISTIC_MODEL_WITH_PARASITE_GENETICS.

Seealso

FPG model — For an overview of the FPG model, genome configuration, and the full FPG workflow.

Output files

  • infIndexRecursive-genomes-df.csv - A list of infected individuals in each node at each time step, where each row represents one person.
  • variants.npy - A numpy binary file containing the nucleotide sequence data for each genome referenced in the recursive_nid column of infIndexRecursive-genomes-df.csv. The row index into this array corresponds to the genome index values in recursive_nid.
  • roots.npy - A numpy binary file containing the allele root data for each genome referenced in the recursive_nid column of infIndexRecursive-genomes-df.csv. The row index into this array corresponds to the genome index values in recursive_nid.

Configuration

To generate this report, configure the following parameters in the custom_reports.json file:

Parameter Data type Min Max Default Description
Start_Day float 0 3.40282e+38 0 The day of the simulation to start collecting data.
End_Day float 0 3.40282e+38 3.40282e+38 The day of the simulation to stop collecting data. If you want data collected on a specific day, enter that day plus 1.
Node_IDs_Of_Interest array of integers 1 2.14748e+09 [] Data will be collected for the nodes in this list. Empty list implies all nodes.
Min_Age_Years float 0 125 0 Minimum age in years of people to include in the report.
Max_Age_Years float 0 125 125 Maximum age in years of people to include in the report.
Must_Have_IP_Key_Value string NA NA (empty string) A Key:Value pair that the individual must have in order to be included. Empty string means to not include IPs in the selection criteria.
Must_Have_Intervention string NA NA (empty string) The name of the intervention that the person must have in order to be included. Empty string means to not include interventions in the selection criteria.
Minimum_Parasite_Density float 0 3.40282e+38 1.0 The minimum parasite density (asexual parasites per microliter of blood) that an infection must have to be included. A non-zero value filters out hepatocyte-stage infections and those with only gametocytes.
Sampling_Period float 1 3.40282e+38 1 The number of days between sampling the population. Data is collected on days Start_Day, Start_Day + Sampling_Period, Start_Day + 2*Sampling_Period, and so on.
Include_Genome_IDs boolean NA NA 0 If true (1), an additional genome_ids column is appended to the CSV output containing EMOD's internal ID for the genome of each infection's parasite. This ID can be used to cross-reference genome data with other EMOD reports that include genome IDs.
{
    "Reports": [
        {
            "class": "ReportFpgOutputForObservationalModel",
            "Start_Day": 3650,
            "End_Day": 4381,
            "Node_IDs_Of_Interest": [1, 3],
            "Min_Age_Years": 0,
            "Max_Age_Years": 5,
            "Must_Have_IP_Key_Value": "Accessibility:YES",
            "Must_Have_Intervention": "AntimalarialDrug",
            "Minimum_Parasite_Density": 1.0,
            "Sampling_Period": 30.4166667,
            "Include_Genome_IDs": 0
        }
    ],
    "Use_Defaults": 1
}

This example collects data after running for 10 years and collects it for the next 2 years. It only collects data from nodes 1 and 3 for children 5 and under who have accessibility to healthcare and are taking anti-malarial drugs. The infections must have a parasite density of at least 1.0. The Sampling_Period of 30.4166667 is 365/12, resulting in 12 collections per year (approximately monthly). The report will have entries for the following days:

3650, 3681, 3711, 3742, 3772, 3803, 3833, 3863, 3894, 3924, 3955, 3985, 4015, 4046, 4076, 4107, 4137, 4168, 4198, 4228, 4259, 4289, 4320, 4350, 4380

There are 25 entries — the initial collection on day 3650 plus 24 subsequent monthly collections, ending on day 4380 (exactly two years later). End_Day is set to 4381 rather than 4380 because the report collects data only on days strictly less than End_Day; it must be set one day past the last desired collection day to include it.

Output file: infIndexRecursive-genomes-df.csv

Each row of the report represents one infected person sampled at a given time step. Only individuals with at least one infection meeting the Minimum_Parasite_Density threshold are included. The report contains the following columns:

Column Data type Description
population integer The external ID of the node the person is currently in.
year integer The year of the data starting at zero. Used as a label for the time bin of data.
month integer A value from 0 to 11 that, together with the year column, specifies the time bin of data.
infIndex integer A unique identifier for this row of data; an increasing integer with each row.
day float The day of the simulation in EMOD. The year and month values correspond to this day. For example, if day is 715, then year=1 and month=11.
count (not used) This column is not used.
age_day float The age of the person in days.
fever_status integer 0 = no fever, 1 = has fever (clinical disease symptoms present).
recursive_nid array of integers A quoted list of genome indices — one per qualifying infection the person has — where each value is the row index into variants.npy and roots.npy for that infection's genome data (e.g., "[0,1,2]"). The entries in recursive_nid, infection_ids, bite_ids, and genome_ids are parallel arrays: the i-th entry in each refers to the same infection.
recursive_count integer The number of active infections meeting the Minimum_Parasite_Density threshold; equals the number of entries in recursive_nid.
IndividualID integer The unique ID of the person in EMOD.
infection_ids array of integers A quoted list of unique EMOD infection IDs, one per qualifying infection. Entries are in the same order as recursive_nid.
bite_ids array of integers A quoted list of bite IDs, one per qualifying infection, identifying the mosquito bite that initiated each infection. Entries are in the same order as recursive_nid.
genome_ids array of integers (Optional) A quoted list of EMOD's internal genome IDs, one per qualifying infection. Entries are in the same order as recursive_nid. Only present when Include_Genome_IDs is set to true (1).

Example

The following is an example of infIndexRecursive-genomes-df.csv:

population,year,month,infIndex,day,count,age_day,fever_status,recursive_nid,recursive_count,IndividualID,infection_ids,bite_ids,genome_ids
1,0,9,0,300,,1108.94,1,"[0,1,2,3,4,5,6,7,8,9]",10,2,"[2534,3364,3643,7816,7817,7818,7819,10932,10933,10934]","[497532,526283,537262,613210,613210,613210,613210,661772,661772,661772]","[14,4,10,7714,7717,7718,34,22,525,15318]"
1,0,9,1,300,,8153.8,1,"[10,11,12,13,14,15,16,17,1]",9,3,"[2428,6284,6285,6286,9469,9470,10935,10936,10937]","[495486,592462,592462,592462,636574,636574,662177,662177,662177]","[40,4933,4935,20,734,38,25625,25630,4]"
1,0,9,2,300,,843.391,1,"[18,19,20,21,22,23]",6,4,"[2348,7820,8390,8391,8392,8393]","[491221,612500,619455,619455,619455,619455]","[36,8,9474,32,9478,9479]"
1,0,9,3,300,,1729.52,1,"[19,24,25,26,27,6,28,0,29,30]",10,5,"[2260,7609,8018,8019,8394,8395,8396,8397,8972,8973]","[490394,611028,615591,615591,620939,620939,620939,620939,628136,628136]","[8,12,8320,8321,7715,34,12260,14,14459,14460]"
1,0,9,4,300,,6368.14,1,"[1,18,0,0,31,0,2,32,33,10]",10,6,"[2811,2812,4743,4892,5869,5870,6473,7249,7821,10522]","[504919,504919,569421,572401,586119,586119,595298,604633,611201,652666]","[4,36,14,14,3778,14,10,5748,9325,40]"