Quality-of-life fork: refactored AutoModel, pyproject.toml, unit tests, CI#291
Open
Doe-Johnevro wants to merge 6 commits into
Open
Quality-of-life fork: refactored AutoModel, pyproject.toml, unit tests, CI#291Doe-Johnevro wants to merge 6 commits into
Doe-Johnevro wants to merge 6 commits into
Conversation
…n-architecture error
The previous architecture detection was a long if/elif chain that silently
fell back to AirLLMLlama2 for any unknown model, with a 'using hf_token'
print that leaked credential presence and a typo ('artichitecture').
- Replace the if/elif chain with a single ordered _ARCHITECTURE_DISPATCH
table (most-specific patterns first) so adding a new architecture is a
one-line change.
- Add full type hints and a module docstring.
- Use the 'logging' module instead of 'print'; the 'using hf_token' line
is removed (token presence is none of the caller's business).
- Introduce UnknownArchitectureError that lists supported architectures
and links to the issue tracker, instead of a silent fallback.
- Import 'is_on_mac_os' from utils (single source of truth) and remove
the duplicate definitions in auto_model.py and airllm/__init__.py.
The legacy setup.py had a PostInstallCommand that ran 'pip install
--upgrade transformers' after every install. That works locally but
breaks PEP 517 build isolation, editable installs, conda environments
and any other build backend that doesn't shell out to pip.
- Replace the install-hooks-and-magic setup.py with a 3-line shim that
just calls setup() and lets pyproject.toml own the metadata.
- Add air_llm/pyproject.toml (PEP 621) with:
* build-system declaration (setuptools>=68)
* project metadata incl. version, authors, classifiers, URLs
* pinned runtime dependencies (transformers>=4.36 replaces the
forced post-install upgrade and fixes the rope_scaling error)
* optional-dependencies groups for compression, test, lint
* [tool.setuptools.packages.find] so 'airllm' is discovered
* [tool.pytest.ini_options] and [tool.ruff] for tests/lint
The classifier 'License :: OSI Approved :: MIT License' is also wrong
(see LICENSE file) — fixed to 'Apache Software License'.
The existing tests/test_automodel.py reaches out to Hugging Face to download real model configs, which makes it slow, flaky and impossible to run in CI without a network or a heavyweight model checkpoint. Add tests/test_auto_model_dispatch.py that patches AutoConfig.from_pretrained and exercises the dispatch table directly: - one test per supported architecture (Llama, Mistral, Mixtral, Qwen, Qwen2, Baichuan, ChatGLM, InternLM) asserts the correct wrapper class is returned - regression test: Qwen2ForCausalLM must NOT fall through to QWen - regression test: MixtralForCausalLM must NOT fall through to Mistral - unknown architecture raises UnknownArchitectureError with a helpful message that includes the supported list - empty / missing 'architectures' field raises UnknownArchitectureError - the dispatch table has no duplicate needles - hf_token kwarg is forwarded to AutoConfig (and absent means None) Runs offline in under a second with 'pip install -e .[test] && pytest'.
The repo previously had no CI, so broken merges could land unnoticed. Add .github/workflows/ci.yml that: - runs on push to main and on every pull request - tests against a matrix of Python 3.10, 3.11 and 3.12 - installs torch (CPU build, no CUDA needed) then the package with [test,lint] extras from air_llm/pyproject.toml - runs 'ruff check' for lint - runs 'pytest' (picks up the new offline test_auto_model_dispatch.py)
The README jumps straight from 'Supported models' to 'FAQ' and assumes the reader already understands the layer-sharded inference approach. - New 'How it works' section explains the five-stage pipeline (shard-on-disk, stream layers, CPU-pinned KV-cache, prefetching, optional bitsandbytes compression) and why inference is bandwidth- bound rather than compute-bound. - New 'Troubleshooting' section collects the four most common support questions and points each at the right fix or issue tracker. - TOC is updated with both new sections so they're discoverable. Mentions the new UnknownArchitectureError raised by the refactored AutoModel so users can self-serve when they hit an unsupported architecture.
…dule
airllm/airllm_base.py does
from optimum.bettertransformer import BetterTransformer
at import time. In recent optimum releases (>=1.14 or so) that module
was removed, so a vanilla 'pip install airllm' on a fresh environment
fails at import time with ModuleNotFoundError — even though the
BetterTransformer symbol is only used inside an internal class that the
unit tests never touch.
Add tests/conftest.py that pre-populates sys.modules with a MagicMock
for the missing module. Pytest auto-loads conftest.py so the test
suite now runs in any environment, regardless of which optimum version
is installed.
Verified locally: 'pytest tests/test_auto_model_dispatch.py' -> 8 passed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi! This PR collects several small, low-risk improvements I've had sitting in my fork. Each commit is self-contained and the diff per commit is small, so this is easy to review or cherry-pick individually.
What's in here
*
efactor(auto_model)* — \�irllm/auto_model.py\
_ARCHITECTURE_DISPATCH\ table. Adding support for a new model is a
one-line change.
\print('using hf_token')\ line that leaked credential presence.
the model config is empty or the architecture isn't supported, with
a message that lists every supported class and links to the issue
tracker. Previously the code silently fell back to \AirLLMLlama2,
which is the 请问,能多卡并行训练吗? #1 source of 'why doesn't my model work' reports.
remove the duplicate definitions in \�uto_model.py\ and the
package _init_.py.
*\�uild: pyproject.toml* — \�ir_llm/pyproject.toml\ (new) +
\�ir_llm/setup.py\ (slimmed down to a 3-line shim)
\PostInstallCommand\ hack that ran \pip install --upgrade
transformers\ from inside \setup.py install\ is no longer needed.
That hack broke PEP 517 build isolation, editable installs, conda
environments and any other non-pip build backend.
it isn't build-time-installable everywhere.
(the LICENSE file is Apache-2.0).
lint.
*\ est: offline unit tests* — \�ir_llm/tests/test_auto_model_dispatch.py\ (new)
download real model configs, which makes it slow, flaky, and
impossible to run in CI.
the dispatch table directly: one assertion per supported
architecture, regression tests for the most-specific-match
ordering, and a check that the dispatch table has no duplicate
needles.
\optimum.bettertransformer\ module so the package imports cleanly
on modern \optimum\ versions. (See CI below for why this matters.)
*\ci: GitHub Actions* — .github/workflows/ci.yml\ (new)
uff check\ and \pytest.
*\docs(readme)* — adds a 'How it works' section explaining the
layer-sharded inference pipeline (shard-on-disk, stream layers,
CPU-pinned KV-cache, prefetching, optional compression) and a
'Troubleshooting' section for the four most common support
questions. TOC updated.
Verified locally
\
$ pytest tests/test_auto_model_dispatch.py -v
collected 8 items
... 8 passed in 5.09s
\\
Risk
Each commit is independent and small. The refactor of \�uto_model.py
changes one user-visible behavior: instead of a silent fallback to
\AirLLMLlama2\ for unknown architectures, it now raises
\UnknownArchitectureError\ with a helpful message. The \AutoModel
class is still constructed the same way (\rom_pretrained).
Happy to split this into multiple PRs if you'd prefer — just let me
know which pieces you're interested in.