Quality-of-life fork: refactored AutoModel, pyproject.toml, unit tests, CI by Doe-Johnevro · Pull Request #291 · lyogavin/airllm

Doe-Johnevro · 2026-06-04T02:32:15Z

Hi! This PR collects several small, low-risk improvements I've had sitting in my fork. Each commit is self-contained and the diff per commit is small, so this is easy to review or cherry-pick individually.

What's in here

*
efactor(auto_model)* — \�irllm/auto_model.py\
- Replace the if/elif architecture chain with a single ordered
  _ARCHITECTURE_DISPATCH\ table. Adding support for a new model is a
  one-line change.
- Use \logging\ instead of \print; remove the
  \print('using hf_token')\ line that leaked credential presence.
- Raise \UnknownArchitectureError\ (subclass of \ValueError) when
  the model config is empty or the architecture isn't supported, with
  a message that lists every supported class and links to the issue
  tracker. Previously the code silently fell back to \AirLLMLlama2,
  which is the 请问，能多卡并行训练吗？ #1 source of 'why doesn't my model work' reports.
- Add full type hints and a module docstring.
- Import \is_on_mac_os\ from .utils\ (single source of truth) and
  remove the duplicate definitions in \�uto_model.py\ and the
  package _init_.py.
- Fix the typo 'artichitecture' in the warning message.
*\�uild: pyproject.toml* — \�ir_llm/pyproject.toml\ (new) +
\�ir_llm/setup.py\ (slimmed down to a 3-line shim)
- Move all package metadata to PEP 621 \pyproject.toml.
- Pin \ ransformers>=4.36\ so the old
  \PostInstallCommand\ hack that ran \pip install --upgrade
  transformers\ from inside \setup.py install\ is no longer needed.
  That hack broke PEP 517 build isolation, editable installs, conda
  environments and any other non-pip build backend.
- Move \�itsandbytes\ to [project.optional-dependencies]\ since
  it isn't build-time-installable everywhere.
- Fix the wrong \License :: OSI Approved :: MIT License\ classifier
  (the LICENSE file is Apache-2.0).
- Add [tool.pytest.ini_options]\ and [tool.ruff]\ for tests and
  lint.
*\ est: offline unit tests* — \�ir_llm/tests/test_auto_model_dispatch.py\ (new)
- The existing \ est_automodel.py\ reaches out to Hugging Face to
  download real model configs, which makes it slow, flaky, and
  impossible to run in CI.
- The new test patches \AutoConfig.from_pretrained\ and exercises
  the dispatch table directly: one assertion per supported
  architecture, regression tests for the most-specific-match
  ordering, and a check that the dispatch table has no duplicate
  needles.
- \�ir_llm/tests/conftest.py\ (new) stubs the now-removed
  \optimum.bettertransformer\ module so the package imports cleanly
  on modern \optimum\ versions. (See CI below for why this matters.)
*\ci: GitHub Actions* — .github/workflows/ci.yml\ (new)
- Matrix of Python 3.10 / 3.11 / 3.12 on Ubuntu.
- Installs torch (CPU build), then \pip install -e .[test,lint].
- Runs
  uff check\ and \pytest.
*\docs(readme)* — adds a 'How it works' section explaining the
layer-sharded inference pipeline (shard-on-disk, stream layers,
CPU-pinned KV-cache, prefetching, optional compression) and a
'Troubleshooting' section for the four most common support
questions. TOC updated.

Verified locally

\
$ pytest tests/test_auto_model_dispatch.py -v
collected 8 items
... 8 passed in 5.09s
\\

Risk

Each commit is independent and small. The refactor of \�uto_model.py
changes one user-visible behavior: instead of a silent fallback to
\AirLLMLlama2\ for unknown architectures, it now raises
\UnknownArchitectureError\ with a helpful message. The \AutoModel
class is still constructed the same way (\rom_pretrained).

Happy to split this into multiple PRs if you'd prefer — just let me
know which pieces you're interested in.

…n-architecture error The previous architecture detection was a long if/elif chain that silently fell back to AirLLMLlama2 for any unknown model, with a 'using hf_token' print that leaked credential presence and a typo ('artichitecture'). - Replace the if/elif chain with a single ordered _ARCHITECTURE_DISPATCH table (most-specific patterns first) so adding a new architecture is a one-line change. - Add full type hints and a module docstring. - Use the 'logging' module instead of 'print'; the 'using hf_token' line is removed (token presence is none of the caller's business). - Introduce UnknownArchitectureError that lists supported architectures and links to the issue tracker, instead of a silent fallback. - Import 'is_on_mac_os' from utils (single source of truth) and remove the duplicate definitions in auto_model.py and airllm/__init__.py.

The legacy setup.py had a PostInstallCommand that ran 'pip install --upgrade transformers' after every install. That works locally but breaks PEP 517 build isolation, editable installs, conda environments and any other build backend that doesn't shell out to pip. - Replace the install-hooks-and-magic setup.py with a 3-line shim that just calls setup() and lets pyproject.toml own the metadata. - Add air_llm/pyproject.toml (PEP 621) with: * build-system declaration (setuptools>=68) * project metadata incl. version, authors, classifiers, URLs * pinned runtime dependencies (transformers>=4.36 replaces the forced post-install upgrade and fixes the rope_scaling error) * optional-dependencies groups for compression, test, lint * [tool.setuptools.packages.find] so 'airllm' is discovered * [tool.pytest.ini_options] and [tool.ruff] for tests/lint The classifier 'License :: OSI Approved :: MIT License' is also wrong (see LICENSE file) — fixed to 'Apache Software License'.

The existing tests/test_automodel.py reaches out to Hugging Face to download real model configs, which makes it slow, flaky and impossible to run in CI without a network or a heavyweight model checkpoint. Add tests/test_auto_model_dispatch.py that patches AutoConfig.from_pretrained and exercises the dispatch table directly: - one test per supported architecture (Llama, Mistral, Mixtral, Qwen, Qwen2, Baichuan, ChatGLM, InternLM) asserts the correct wrapper class is returned - regression test: Qwen2ForCausalLM must NOT fall through to QWen - regression test: MixtralForCausalLM must NOT fall through to Mistral - unknown architecture raises UnknownArchitectureError with a helpful message that includes the supported list - empty / missing 'architectures' field raises UnknownArchitectureError - the dispatch table has no duplicate needles - hf_token kwarg is forwarded to AutoConfig (and absent means None) Runs offline in under a second with 'pip install -e .[test] && pytest'.

The repo previously had no CI, so broken merges could land unnoticed. Add .github/workflows/ci.yml that: - runs on push to main and on every pull request - tests against a matrix of Python 3.10, 3.11 and 3.12 - installs torch (CPU build, no CUDA needed) then the package with [test,lint] extras from air_llm/pyproject.toml - runs 'ruff check' for lint - runs 'pytest' (picks up the new offline test_auto_model_dispatch.py)

The README jumps straight from 'Supported models' to 'FAQ' and assumes the reader already understands the layer-sharded inference approach. - New 'How it works' section explains the five-stage pipeline (shard-on-disk, stream layers, CPU-pinned KV-cache, prefetching, optional bitsandbytes compression) and why inference is bandwidth- bound rather than compute-bound. - New 'Troubleshooting' section collects the four most common support questions and points each at the right fix or issue tracker. - TOC is updated with both new sections so they're discoverable. Mentions the new UnknownArchitectureError raised by the refactored AutoModel so users can self-serve when they hit an unsupported architecture.

…dule airllm/airllm_base.py does from optimum.bettertransformer import BetterTransformer at import time. In recent optimum releases (>=1.14 or so) that module was removed, so a vanilla 'pip install airllm' on a fresh environment fails at import time with ModuleNotFoundError — even though the BetterTransformer symbol is only used inside an internal class that the unit tests never touch. Add tests/conftest.py that pre-populates sys.modules with a MagicMock for the missing module. Pytest auto-loads conftest.py so the test suite now runs in any environment, regardless of which optimum version is installed. Verified locally: 'pytest tests/test_auto_model_dispatch.py' -> 8 passed.

Doe-Johnevro added 6 commits June 4, 2026 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quality-of-life fork: refactored AutoModel, pyproject.toml, unit tests, CI#291

Quality-of-life fork: refactored AutoModel, pyproject.toml, unit tests, CI#291
Doe-Johnevro wants to merge 6 commits into
lyogavin:mainfrom
Doe-Johnevro:main

Doe-Johnevro commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Doe-Johnevro commented Jun 4, 2026

What's in here

Verified locally

Risk

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant