Skip to content

"f1" score, the model still considers "accuracy" #646#822

Open
Jdmesa wants to merge 1 commit into
mljar:masterfrom
Jdmesa:fix/f1_score
Open

"f1" score, the model still considers "accuracy" #646#822
Jdmesa wants to merge 1 commit into
mljar:masterfrom
Jdmesa:fix/f1_score

Conversation

@Jdmesa

@Jdmesa Jdmesa commented May 29, 2026

Copy link
Copy Markdown

Problem

When using ml_task="multiclass_classification" and setting eval_metric="f1",
the XGBoost algorithm was internally defaulting to accuracy as its evaluation
metric instead of respecting the user's choice.

This happened because the xgboost_eval_metric() function in
supervised/algorithms/xgboost.py only handled the logloss → mlogloss
mapping for multiclass, leaving f1 and accuracy without explicit routing
as custom metrics. As a result, XGBoost did not recognize f1 as a native
metric for multiclass tasks and silently fell back to accuracy.

Fix

Added an explicit elif branch in xgboost_eval_metric() to handle f1
and accuracy as custom metrics when ml_task is multiclass_classification,
consistent with how lightgbm_eval_metric() already handles this case in
supervised/algorithms/lightgbm.py.

Changes

  • supervised/algorithms/xgboost.py: updated xgboost_eval_metric() to
    explicitly route f1 and accuracy as custom metrics for multiclass.
  • tests/tests_fairness/test_multi_class_classification.py: added
    test_with_f1_metric() to cover the multiclass case with eval_metric="f1"
    and sensitive features.

When ml_task is multiclass_classification and the user selects 'f1'
as eval_metric, XGBoost was defaulting to 'accuracy' internally
because 'f1' is not a native XGBoost metric for multiclass.

This fix ensures f1 and accuracy are explicitly routed as custom
metrics in the multiclass case, consistent with how LightGBM handles it.

Adds a test case covering eval_metric='f1' in multiclass with
sensitive features.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant