Skip to content

plot_relationship_start

extract_assortivity_risk(start_rel_filename, relationship_type, male_risk_value='LOW')

For the given relationship type, extract the number of relationships that started during each time step for each risk value pair. The male risk value is constant so it should return a dataframe with three columns.

Parameters:

Name Type Description Default
start_rel_filename (str, required)

The name and path of the RelationshipStart.csv file to be read.

required
relationship_type (int, required)

The type of relationship. Options: 0 (transitory), 1 (informal), 2 (marital), 3 (commercial).

required
male_risk_value str

The risk value of the male in the relationship being plotted. This will be either LOW, MEDIUM, or HIGH. Capitalization matters. Default is LOW.

'LOW'

Returns:

Type Description
DataFrame

Dataframe with three columns where each column is for a risk value pairing. Each row should represent a simulation time (in days) that had relationhips created of that time and risk value pairing. There is no guarantee that relationships are created each time step.

Source code in emodpy_hiv/plotting/plot_relationship_start.py
def extract_assortivity_risk(start_rel_filename: str,
                             relationship_type: int,
                             male_risk_value="LOW"):
    """
    For the given relationship type, extract the number of relationships that started
    during each time step for each risk value pair.  The male risk value is constant
    so it should return a dataframe with three columns.

    Args:
        start_rel_filename (str, required):
            The name and path of the RelationshipStart.csv file to be read.

        relationship_type (int, required):
            The type of relationship. Options: 0 (transitory), 1 (informal), 2 (marital), 3 (commercial).

        male_risk_value (str, optional):
            The risk value of the male in the relationship being plotted.  This will be
            either LOW, MEDIUM, or HIGH.  Capitalization matters.
            Default is LOW.

    Returns:
        (pd.DataFrame): Dataframe with three columns where each column is for a risk value pairing.
            Each row should represent a simulation time (in days) that had relationhips
            created of that time and risk value pairing.  There is no guarantee that
            relationships are created each time step.
    """
    df = pd.read_csv(start_rel_filename)

    if COL_NAME_REL_TYPE not in df.columns:
        raise ValueError(f"'{COL_NAME_REL_TYPE}' column does not exist in the file({start_rel_filename}).")

    if relationship_type not in df[COL_NAME_REL_TYPE].unique():
        raise ValueError(f"'{relationship_type}' does not appear as a relationship type in the file({start_rel_filename}).")

    df = df[ df[COL_NAME_REL_TYPE] == relationship_type ] # noqa: E201, E202

    results_df = pd.DataFrame()
    results_df.index = df[COL_NAME_START_TIME].unique()
    results_df[COL_NAME_START_TIME] = df[COL_NAME_START_TIME].unique()
    risk_values = ["LOW", "MEDIUM", "HIGH"]
    for rv_a in [male_risk_value]:
        df_risk = df[ df[COL_NAME_RISK_A] == rv_a ] # noqa: E201, E202
        for rv_b in risk_values:
            df_risk2 = df_risk[ df_risk[COL_NAME_RISK_B] == rv_b ] # noqa: E201, E202, E226
            tmp_df = pd.DataFrame()
            tmp_df[COL_NAME_START_TIME] = df_risk2[COL_NAME_START_TIME]
            tmp_df[COL_NAME_RISK_B] = df_risk2[COL_NAME_RISK_B]
            tmp_df = tmp_df.groupby(COL_NAME_START_TIME).count()
            if len(tmp_df[COL_NAME_RISK_B]) == 0:
                tmp_df = pd.DataFrame()
                tmp_df.index = df_risk[COL_NAME_START_TIME].unique()
                tmp_df[COL_NAME_START_TIME] = df_risk[COL_NAME_START_TIME].unique()
                tmp_df[COL_NAME_RISK_B] = 0
            results_df[rv_a + "-" + rv_b] = tmp_df[COL_NAME_RISK_B]
            results_df = results_df.fillna(0)

    del results_df[COL_NAME_START_TIME]
    results_df = results_df.fillna(0)

    return results_df

plot_relationship_assortivity_risk(dir_or_filename, relationship_type, male_risk_value='LOW', show_avg_per_run=False, show_regression=False, regression_dir=None, img_dir=None)

Create a plot showing the number of relationships of a given type that started during the timestep for a male with the give risk value versus females with the other possible values. For example, if the male's risk value is HIGH, the plot will contain three curves: HIGH-LOW, HIGH-MEDIUM, and HIGH-HIGH. They will all be for the given relationship type. We only do three curves because the data can be quite noisy.

The plot also has the option to show a least squares regression line for each risk value pair. A CSV file can be saved with the regression data. This can be used to compare with the plot_a_vs_b() function to compare the regression from two different sets of files.

Parameters:

Name Type Description Default
dir_or_filename (str, required)

The directory or filename containing the RelationshipStart.csv files.

required
relationship_type str

The type of relationship. Options: 0 (transitory), 1 (informal), 2 (marital), 3 (commercial).

required
male_risk_value str

The risk value of the male in the relationship being plotted. This will be either LOW, MEDIUM, or HIGH. Capitalization matters. Default is LOW.

'LOW'
show_avg_per_run bool

If 'dir_or_filename' is a directory, this will calculate the average number of relationships started at each timestep for the different files in the directory. Default is False.

False
show_regression bool

If true, a least squares regression line will be calculated and shown on the plot. There will be one line for each risk value pair. Default is False.

False
regression_dir str

If 'show_regression' is true and this provides a path to a directory, then a CSV file will be saved with the data points of the displayed regression lines. The name of the file will be the relationship type and the male's risk value. For example, COMMERCIAL-HIGH.csv. Default is None.

None
img_dir str

Directory to save the images. If None, the images will not be saved and a window will be opened.

None

Returns:

Source code in emodpy_hiv/plotting/plot_relationship_start.py
def plot_relationship_assortivity_risk(dir_or_filename: str,
                                       relationship_type: int,
                                       male_risk_value: str = "LOW",
                                       show_avg_per_run: bool = False,
                                       show_regression: bool = False,
                                       regression_dir: str = None,
                                       img_dir: str = None):
    """
    Create a plot showing the number of relationships of a given type
    that started during the timestep for a male with the give risk value
    versus females with the other possible values.  For example, if the
    male's risk value is HIGH, the plot will contain three curves:
    HIGH-LOW, HIGH-MEDIUM, and HIGH-HIGH.  They will all be for the
    given relationship type.  We only do three curves because the data
    can be quite noisy.

    The plot also has the option to show a least squares regression line
    for each risk value pair.  A CSV file can be saved with the regression
    data.  This can be used to compare with the plot_a_vs_b() function to
    compare the regression from two different sets of files.

    Args:
        dir_or_filename (str, required):
            The directory or filename containing the RelationshipStart.csv files.

        relationship_type (str, optional):
            The type of relationship. Options: 0 (transitory), 1 (informal), 2 (marital), 3 (commercial).

        male_risk_value (str, optional):
            The risk value of the male in the relationship being plotted.  This will be
            either LOW, MEDIUM, or HIGH.  Capitalization matters.
            Default is LOW.

        show_avg_per_run (bool, optional):
            If 'dir_or_filename' is a directory, this will calculate the average number of
            relationships started at each timestep for the different files in the directory.
            Default is False.

        show_regression (bool, optional):
            If true, a least squares regression line will be calculated and shown on the plot.
            There will be one line for each risk value pair.
            Default is False.

        regression_dir (str, optional):
            If 'show_regression' is true and this provides a path to a directory, then a CSV
            file will be saved with the data points of the displayed regression lines.  The
            name of the file will be the relationship type and the male's risk value.  For
            example, COMMERCIAL-HIGH.csv.
            Default is None.

        img_dir (str, optional):
            Directory to save the images. If None, the images will not be saved and a window will be opened.

    Returns:
    """
    if not show_regression and regression_dir:
        raise ValueError("Regression directory is set but show_regression is False.\nYou need to show regression if you want to save it.")

    # -------------------------------------------------------------------
    # Get the list of RelationshipStart.csv files in the given directory.
    # -------------------------------------------------------------------
    dir_filenames = helpers.get_filenames(dir_or_filename=dir_or_filename,
                                          file_prefix="RelationshipStart",
                                          file_extension=".csv")

    # ----------------------------------------------------------------------------------
    # For the given relationship type, extract the number of relationships that started
    # at each time step for each risk value pair where the male risk value is constant.
    # This should result in three columns with counts of new relationships.
    # ----------------------------------------------------------------------------------
    combined_df = pd.DataFrame()
    for fn in dir_filenames:
        df = extract_assortivity_risk(start_rel_filename=fn,
                                      relationship_type=relationship_type,
                                      male_risk_value=male_risk_value)
        if len(combined_df.columns) == 0:
            combined_df.index = df.index
            for column_name in df.columns:
                name = column_name
                if not show_avg_per_run:
                    name = fn + "-" + name
                combined_df[name] = 0
        for column_name in df.columns:
            if show_avg_per_run:
                combined_df[column_name] = combined_df[column_name] + df[column_name]
            else:
                name = fn + "-" + column_name
                combined_df[name] = df[column_name]
        combined_df = combined_df.fillna(0)

    if show_avg_per_run:
        for column_name in combined_df.columns:
            combined_df[column_name] = combined_df[column_name] / len(dir_filenames)

    # ---------------------------------------------------------------------------------
    # For a given time step, convert the counts of new relationships to a fraction.
    # If we are show all the files, then we need to calculate the fractions for each
    # file.  If doing the average, then it will be the fractions based on the averages.
    # ---------------------------------------------------------------------------------
    col_name_prefixs = [""]
    if not show_avg_per_run:
        col_name_prefixs = []
        for fn in dir_filenames:
            col_name_prefixs.append(fn + "-")

    total_df = pd.DataFrame()
    total_df.index = combined_df.index

    for prefix in col_name_prefixs:
        total_label = "total-" + prefix
        total_df[total_label] = 0
        for column_name in combined_df.columns:
            if prefix in column_name:
                total_df[total_label] = total_df[total_label] + combined_df[column_name]
        for column_name in combined_df.columns:
            if prefix in column_name:
                combined_df[column_name] = combined_df[column_name] / total_df[total_label]
            combined_df[column_name] = combined_df[column_name].fillna(0)

    # ---------------------------------------------------------------------------------
    # If requested, determine a least squares regression line for each risk value pair.
    # ---------------------------------------------------------------------------------
    expected_df = None
    if show_regression:
        expected_df = pd.DataFrame()
        expected_df.index = combined_df.index
        x = combined_df.index.values
        x = x.reshape(len(x), 1)
        for col_name in combined_df.columns:
            y = combined_df[col_name].values      # Dependent variable
            y = y.reshape(len(y), 1)
            model = LinearRegression()
            model.fit(x, y)
            expected_df["Regression-" + col_name] = model.predict(x)

    # convert relationship type to a string
    rel_str = "TRANSITORY"
    if relationship_type == 1:
        rel_str = "INFORMAL"
    elif relationship_type == 2:
        rel_str = "MARITAL"
    elif relationship_type == 3:
        rel_str = "COMMERCIAL"

    # save the regression data to a file
    if regression_dir and show_regression:
        if not os.path.exists(regression_dir):
            os.makedirs(regression_dir)
        expected_df["Time"] = expected_df.index
        expected_df.to_csv(regression_dir + "/" + rel_str + "-" + male_risk_value + ".csv", index=False)
        del expected_df["Time"]

    # ------------------------------
    # Create the titles for the plot
    # ------------------------------
    title2 = ""
    for col in combined_df.columns:
        title2 = title2 + col + "=" + f'{combined_df[col].mean():0.3f}' + " "

    title = ""
    if show_avg_per_run:
        title = "Average Per Run - "
    title = title + f"Relationship Per Risk Assortivity - {rel_str} - Male Risk = {male_risk_value}"

    # -------------
    # plot the data
    # -------------
    xy_plot.xy_plot(img_dir=img_dir,
                    df=combined_df,
                    expected_df=expected_df,
                    title_1=title,
                    title_2=title2,
                    x_axis_name="Days",
                    y_axis_name="Fraction of Relationships",
                    show_legend=show_avg_per_run,
                    show_markers=show_avg_per_run,
                    fraction_of_total=False,
                    min_x=None, max_x=None, min_y=None, max_y=None,
                    x_axis_as_log_scale=False,
                    y_axis_as_log_scale=False)

plot_relationship_assortivity_risk_all(dir_or_filename, regression_dir=None, img_dir=None)

Create a plot for each combination male risk value and relationship type.

Parameters:

Name Type Description Default
dir_or_filename (str, required)

The directory or filename containing the RelationshipStart.csv files or a specific file.

required
regression_dir str

If provided, a CSV file will be created for each plot where the CSV file has one column for each risk value combination - three columns because the male's value is fixed.

None
img_dir str

Directory to save the images. If None, the images will not be saved and a window will be opened. Default is none - don't save image and open a window.

None

Returns:

Source code in emodpy_hiv/plotting/plot_relationship_start.py
def plot_relationship_assortivity_risk_all(dir_or_filename: str,
                                           regression_dir: str = None,
                                           img_dir: str = None):
    """
    Create a plot for each combination male risk value and relationship type.

    Args:
        dir_or_filename (str, required):
            The directory or filename containing the RelationshipStart.csv files or a specific file.

        regression_dir (str, optional):
            If provided, a CSV file will be created for each plot where the CSV file has one column
            for each risk value combination - three columns because the male's value is fixed.

        img_dir (str, optional):
            Directory to save the images.  If None, the images will not be saved and a window will be opened.
            Default is none - don't save image and open a window.

    Returns:
    """
    male_risk_values = ["LOW", "MEDIUM", "HIGH"]
    for rel_type in [0, 1, 2, 3]:
        for risk_value in male_risk_values:
            plot_relationship_assortivity_risk(dir_or_filename=dir_or_filename,
                                               relationship_type=rel_type,
                                               male_risk_value=risk_value,
                                               show_avg_per_run=True,
                                               show_regression=True,
                                               regression_dir=regression_dir,
                                               img_dir=img_dir)