Skip to content

[BUG] In QUALX, the value of sql_ops_xxx is binary. #2002

@fang-tech

Description

@fang-tech

Describe the bug
In qualx tool, featurizer default.py, sql_ops_xxx feature's value is 0/1.
However, based on the comments and variable names, the values of these features should all be the number of occurrences of this node.

Steps/Code to reproduce bug

    # count occurrences
    sql_ops_counter['counter'] = 1  # counter col for pivot_table()
    sql_ops_list = list(sql_ops_counter['nodeName'].unique())  # unique sql ops
    sql_ops_map = {cc: 'sqlOp_' + cc for cc in sql_ops_list}  # add prefix to sql ops
    sql_ops_counter['nodeName'] = sql_ops_counter['nodeName'].map(sql_ops_map)
    sql_ops_list = list(
        sql_ops_counter['nodeName'].unique()
    )  # update unique sql ops list

    # pivot sql ops rows to columns
    sql_ops_counter = (
        pd.pivot_table(
            sql_ops_counter,
            index=['appId', 'sqlID'],
            values='counter',
            columns='nodeName',
        )
        .fillna(0)
        .astype(int)
        .reset_index()
    )

Expected behavior
The sql_ops should be aggregated through the sum function. Be like:

        pd.pivot_table(
            sql_ops_counter,
            index=['appId', 'sqlID'],
            values='counter',
            columns='nodeName',
            aggfunc='sum'  # Sum to count total occurrences
        )

Environment details (please complete the following information)

  • Environment location: local for train model

Additional context
none

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions