Skip to content

Quality-of-life fork: refactored AutoModel, pyproject.toml, unit tests, CI#291

Open
Doe-Johnevro wants to merge 6 commits into
lyogavin:mainfrom
Doe-Johnevro:main
Open

Quality-of-life fork: refactored AutoModel, pyproject.toml, unit tests, CI#291
Doe-Johnevro wants to merge 6 commits into
lyogavin:mainfrom
Doe-Johnevro:main

Conversation

@Doe-Johnevro

Copy link
Copy Markdown

Hi! This PR collects several small, low-risk improvements I've had sitting in my fork. Each commit is self-contained and the diff per commit is small, so this is easy to review or cherry-pick individually.

What's in here

  1. *
    efactor(auto_model)*
    — \�irllm/auto_model.py\

    • Replace the if/elif architecture chain with a single ordered
      _ARCHITECTURE_DISPATCH\ table. Adding support for a new model is a
      one-line change.
    • Use \logging\ instead of \print; remove the
      \print('using hf_token')\ line that leaked credential presence.
    • Raise \UnknownArchitectureError\ (subclass of \ValueError) when
      the model config is empty or the architecture isn't supported, with
      a message that lists every supported class and links to the issue
      tracker. Previously the code silently fell back to \AirLLMLlama2,
      which is the 请问,能多卡并行训练吗? #1 source of 'why doesn't my model work' reports.
    • Add full type hints and a module docstring.
    • Import \is_on_mac_os\ from .utils\ (single source of truth) and
      remove the duplicate definitions in \�uto_model.py\ and the
      package _init_.py.
    • Fix the typo 'artichitecture' in the warning message.
  2. *\�uild: pyproject.toml* — \�ir_llm/pyproject.toml\ (new) +
    \�ir_llm/setup.py\ (slimmed down to a 3-line shim)

    • Move all package metadata to PEP 621 \pyproject.toml.
    • Pin \ ransformers>=4.36\ so the old
      \PostInstallCommand\ hack that ran \pip install --upgrade
      transformers\ from inside \setup.py install\ is no longer needed.
      That hack broke PEP 517 build isolation, editable installs, conda
      environments and any other non-pip build backend.
    • Move \�itsandbytes\ to [project.optional-dependencies]\ since
      it isn't build-time-installable everywhere.
    • Fix the wrong \License :: OSI Approved :: MIT License\ classifier
      (the LICENSE file is Apache-2.0).
    • Add [tool.pytest.ini_options]\ and [tool.ruff]\ for tests and
      lint.
  3. *\ est: offline unit tests* — \�ir_llm/tests/test_auto_model_dispatch.py\ (new)

    • The existing \ est_automodel.py\ reaches out to Hugging Face to
      download real model configs, which makes it slow, flaky, and
      impossible to run in CI.
    • The new test patches \AutoConfig.from_pretrained\ and exercises
      the dispatch table directly: one assertion per supported
      architecture, regression tests for the most-specific-match
      ordering, and a check that the dispatch table has no duplicate
      needles.
    • \�ir_llm/tests/conftest.py\ (new) stubs the now-removed
      \optimum.bettertransformer\ module so the package imports cleanly
      on modern \optimum\ versions. (See CI below for why this matters.)
  4. *\ci: GitHub Actions* — .github/workflows/ci.yml\ (new)

    • Matrix of Python 3.10 / 3.11 / 3.12 on Ubuntu.
    • Installs torch (CPU build), then \pip install -e .[test,lint].
    • Runs
      uff check\ and \pytest.
  5. *\docs(readme)* — adds a 'How it works' section explaining the
    layer-sharded inference pipeline (shard-on-disk, stream layers,
    CPU-pinned KV-cache, prefetching, optional compression) and a
    'Troubleshooting' section for the four most common support
    questions. TOC updated.

Verified locally

\
$ pytest tests/test_auto_model_dispatch.py -v
collected 8 items
... 8 passed in 5.09s
\\

Risk

Each commit is independent and small. The refactor of \�uto_model.py
changes one user-visible behavior: instead of a silent fallback to
\AirLLMLlama2\ for unknown architectures, it now raises
\UnknownArchitectureError\ with a helpful message. The \AutoModel
class is still constructed the same way (\ rom_pretrained).

Happy to split this into multiple PRs if you'd prefer — just let me
know which pieces you're interested in.

…n-architecture error

The previous architecture detection was a long if/elif chain that silently
fell back to AirLLMLlama2 for any unknown model, with a 'using hf_token'
print that leaked credential presence and a typo ('artichitecture').

- Replace the if/elif chain with a single ordered _ARCHITECTURE_DISPATCH
  table (most-specific patterns first) so adding a new architecture is a
  one-line change.
- Add full type hints and a module docstring.
- Use the 'logging' module instead of 'print'; the 'using hf_token' line
  is removed (token presence is none of the caller's business).
- Introduce UnknownArchitectureError that lists supported architectures
  and links to the issue tracker, instead of a silent fallback.
- Import 'is_on_mac_os' from utils (single source of truth) and remove
  the duplicate definitions in auto_model.py and airllm/__init__.py.
The legacy setup.py had a PostInstallCommand that ran 'pip install
--upgrade transformers' after every install. That works locally but
breaks PEP 517 build isolation, editable installs, conda environments
and any other build backend that doesn't shell out to pip.

- Replace the install-hooks-and-magic setup.py with a 3-line shim that
  just calls setup() and lets pyproject.toml own the metadata.
- Add air_llm/pyproject.toml (PEP 621) with:
  * build-system declaration (setuptools>=68)
  * project metadata incl. version, authors, classifiers, URLs
  * pinned runtime dependencies (transformers>=4.36 replaces the
    forced post-install upgrade and fixes the rope_scaling error)
  * optional-dependencies groups for compression, test, lint
  * [tool.setuptools.packages.find] so 'airllm' is discovered
  * [tool.pytest.ini_options] and [tool.ruff] for tests/lint

The classifier 'License :: OSI Approved :: MIT License' is also wrong
(see LICENSE file) — fixed to 'Apache Software License'.
The existing tests/test_automodel.py reaches out to Hugging Face to
download real model configs, which makes it slow, flaky and impossible
to run in CI without a network or a heavyweight model checkpoint.

Add tests/test_auto_model_dispatch.py that patches
AutoConfig.from_pretrained and exercises the dispatch table directly:

- one test per supported architecture (Llama, Mistral, Mixtral,
  Qwen, Qwen2, Baichuan, ChatGLM, InternLM) asserts the correct
  wrapper class is returned
- regression test: Qwen2ForCausalLM must NOT fall through to QWen
- regression test: MixtralForCausalLM must NOT fall through to Mistral
- unknown architecture raises UnknownArchitectureError with a helpful
  message that includes the supported list
- empty / missing 'architectures' field raises UnknownArchitectureError
- the dispatch table has no duplicate needles
- hf_token kwarg is forwarded to AutoConfig (and absent means None)

Runs offline in under a second with 'pip install -e .[test] && pytest'.
The repo previously had no CI, so broken merges could land unnoticed.

Add .github/workflows/ci.yml that:

- runs on push to main and on every pull request
- tests against a matrix of Python 3.10, 3.11 and 3.12
- installs torch (CPU build, no CUDA needed) then the package with
  [test,lint] extras from air_llm/pyproject.toml
- runs 'ruff check' for lint
- runs 'pytest' (picks up the new offline test_auto_model_dispatch.py)
The README jumps straight from 'Supported models' to 'FAQ' and assumes
the reader already understands the layer-sharded inference approach.

- New 'How it works' section explains the five-stage pipeline
  (shard-on-disk, stream layers, CPU-pinned KV-cache, prefetching,
  optional bitsandbytes compression) and why inference is bandwidth-
  bound rather than compute-bound.
- New 'Troubleshooting' section collects the four most common support
  questions and points each at the right fix or issue tracker.
- TOC is updated with both new sections so they're discoverable.

Mentions the new UnknownArchitectureError raised by the refactored
AutoModel so users can self-serve when they hit an unsupported
architecture.
…dule

airllm/airllm_base.py does
    from optimum.bettertransformer import BetterTransformer
at import time. In recent optimum releases (>=1.14 or so) that module
was removed, so a vanilla 'pip install airllm' on a fresh environment
fails at import time with ModuleNotFoundError — even though the
BetterTransformer symbol is only used inside an internal class that the
unit tests never touch.

Add tests/conftest.py that pre-populates sys.modules with a MagicMock
for the missing module. Pytest auto-loads conftest.py so the test
suite now runs in any environment, regardless of which optimum version
is installed.

Verified locally: 'pytest tests/test_auto_model_dispatch.py' -> 8 passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant