-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Labels
Description
Describe the bug
In qualx tool, featurizer default.py, sql_ops_xxx feature's value is 0/1.
However, based on the comments and variable names, the values of these features should all be the number of occurrences of this node.
Steps/Code to reproduce bug
# count occurrences
sql_ops_counter['counter'] = 1 # counter col for pivot_table()
sql_ops_list = list(sql_ops_counter['nodeName'].unique()) # unique sql ops
sql_ops_map = {cc: 'sqlOp_' + cc for cc in sql_ops_list} # add prefix to sql ops
sql_ops_counter['nodeName'] = sql_ops_counter['nodeName'].map(sql_ops_map)
sql_ops_list = list(
sql_ops_counter['nodeName'].unique()
) # update unique sql ops list
# pivot sql ops rows to columns
sql_ops_counter = (
pd.pivot_table(
sql_ops_counter,
index=['appId', 'sqlID'],
values='counter',
columns='nodeName',
)
.fillna(0)
.astype(int)
.reset_index()
)Expected behavior
The sql_ops should be aggregated through the sum function. Be like:
pd.pivot_table(
sql_ops_counter,
index=['appId', 'sqlID'],
values='counter',
columns='nodeName',
aggfunc='sum' # Sum to count total occurrences
)Environment details (please complete the following information)
- Environment location: local for train model
Additional context
none