Skip to content

Commit 90f52d1

Browse files
Merge branch 'main' into reflectionai/v0.1.0-alpha.8-dra-v1beta2
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
2 parents feebecc + b3a10ef commit 90f52d1

275 files changed

Lines changed: 23493 additions & 7672 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.agents/skills/grove-grep/SKILL.md

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
---
2+
name: grove-grep
3+
description: >
4+
Interactive guide for authoring a Grove Enhancement Proposal (GREP).
5+
Leads a structured dialogue with the user, section by section, and
6+
writes the proposal file incrementally as each section is confirmed.
7+
Also supports continuing work on an existing GREP.
8+
---
9+
10+
You are facilitating a structured dialogue to help the user write or continue a **Grove Enhancement Proposal (GREP)** — the Grove project's equivalent of a Kubernetes KEP.
11+
12+
Read the rules at `docs/proposals/README.md` and the template at `docs/proposals/NNNN-template/README.md` before starting. Use `docs/proposals/244-topology-aware-scheduling/README.md` as a reference example of a well-formed GREP.
13+
14+
---
15+
16+
## Your role
17+
18+
- **You lead.** Ask one topic at a time. Do not dump all questions at once.
19+
- **You draft.** After the user answers, write a polished draft of that section and confirm it before moving on.
20+
- **You write immediately.** As soon as a section is confirmed, write it to the file — do not wait until the end.
21+
- **You push back gently** when an answer is vague, too short for the section's intent, or mixes concerns (e.g. low-level design in the Proposal section).
22+
- **You keep it tight.** Each concept, example, or motivation should appear in exactly one section. If the user repeats something already covered, flag it: "We already covered this in [Section X] — let's just reference it there rather than repeat it." A good GREP is clear and complete, not exhaustive.
23+
- **You stay on track.** Only move to the next section once the current one is confirmed.
24+
25+
---
26+
27+
## Step 1 — Identify the GREP
28+
29+
Ask the user for either:
30+
- A **GitHub issue number** (the GREP number), or
31+
- A **GitHub issue URL** from https://github.com/ai-dynamo/grove/issues
32+
33+
Zero-pad the issue number to 4 digits (e.g. issue 42 → `0042`).
34+
35+
### Check for an existing GREP
36+
37+
After getting the number, use Glob to search for `docs/proposals/NNNN-*/README.md` (replacing `NNNN` with the zero-padded number).
38+
39+
**If a match is found:**
40+
- Read the existing file.
41+
- Tell the user: "I found an existing GREP at `<path>`. Would you like to continue editing it, or start fresh?"
42+
- If **continuing**: load the existing content as the current state. Walk through each section in order and for each one ask: "Here's the current content for **[Section]** — would you like to update it, or keep it as-is?" Skip confirmed sections and focus on ones the user wants to change or that are missing.
43+
- If **starting fresh**: confirm they want to overwrite, then proceed as for a new GREP.
44+
45+
**If no match is found:**
46+
- Ask for the **short descriptive title** (used as both the document heading and directory name in kebab-case).
47+
- Confirm the proposed path: `docs/proposals/NNNN-descriptive-title/README.md`.
48+
- **Immediately create the file** with the skeleton structure (see Output file structure below), then run `make update-toc`. Tell the user the file has been created and you'll fill it in section by section.
49+
50+
---
51+
52+
## Step 2 — Write the GREP section by section
53+
54+
Work through sections in this order. For **optional** sections, ask if the user wants to include them; skip gracefully if not.
55+
56+
After each section is **confirmed by the user**, immediately update the file with the new content using the Edit tool (replace the placeholder for that section). Then run `make update-toc` to keep the TOC current.
57+
58+
### 2.1 Summary *(required)*
59+
- Audience: broad (not just developers). Should be readable as a release note.
60+
- Single paragraph. Describe what is being proposed and what benefit it brings.
61+
- Tip: if the user writes something very technical, ask them to re-frame it for a non-expert reader.
62+
63+
### 2.2 Motivation *(required)*
64+
- Why does this change matter? What problem does it solve?
65+
- Collect enough context to make Goals and Non-Goals precise.
66+
67+
### 2.3 Goals *(required)*
68+
- Specific, measurable outcomes this GREP will achieve.
69+
- Bullet list. Each goal should be independently verifiable.
70+
71+
### 2.4 Non-Goals *(required)*
72+
- Explicitly out-of-scope items. These bound the discussion.
73+
- Prompt the user: "What are things people might assume this covers, but it deliberately does not?"
74+
75+
### 2.5 Proposal *(required)*
76+
- The *what*, not the *how*. High-level description of the proposed change.
77+
- No API specs or implementation details here — those belong in Design Details.
78+
- If the user starts adding low-level detail, note it and defer it to the right section.
79+
80+
### 2.6 User Stories *(optional)*
81+
- Real-world scenarios that illustrate how the feature will be used.
82+
- Each story follows: "As a [role], I want to [action] so that [benefit]."
83+
- Ask: "Do you have any user stories that would help reviewers understand the motivation?"
84+
85+
### 2.7 Limitations/Risks & Mitigations *(required)*
86+
- Risks to the Kubernetes/Grove ecosystem, operational complexity, edge cases.
87+
- For each risk, ask if there is a planned mitigation.
88+
89+
### 2.8 Design Details *(required)*
90+
- API specifications (Go structs or YAML examples), controller flow, key algorithms.
91+
- Prompt: "Can you share Go API snippets or YAML examples for the new/modified APIs?"
92+
- Note: diagrams are welcome; reference image files in `docs/proposals/NNNN-title/assets/`.
93+
94+
### 2.9 Monitoring *(required)*
95+
- Events, metrics, status conditions, and status fields that indicate feature health.
96+
- Prompt: "What Kubernetes events or Prometheus metrics will this feature emit?"
97+
98+
### 2.10 Dependencies *(optional)*
99+
- External components, CRDs, feature flags, or other GREPs this depends on.
100+
101+
### 2.11 Test Plan *(required)*
102+
- Read existing tests related to the feature area to ground the test plan in what's already there.
103+
- A dedicated tracking issue is not always required for a small feature — use your judgement.
104+
- Cover unit tests (validation, controller logic) and e2e tests (behavioral outcomes with the scheduler).
105+
106+
### 2.12 Graduation Criteria *(required)*
107+
- Define alpha → beta → GA milestones. Generally at least two releases between beta and GA.
108+
- **Calibrate to scope:**
109+
- *Contained changes* (single field, additive API): lean criteria — alpha = implemented + tests passing, beta = validated in production, GA = stable API + no open issues. See `docs/proposals/0368-preferred-topology-constraint/README.md` as an example.
110+
- *Large features* (new controllers, framework changes): richer criteria — alpha may cover a subset of functionality, beta requires interface stability and documentation, GA requires multiple production deployments. See `docs/proposals/375-scheduler-backend-framework/README.md` as an example.
111+
- If the user is unsure, read the referenced examples and suggest criteria based on the feature's scope.
112+
113+
### 2.13 Implementation History *(optional)*
114+
- Key dates: proposal accepted, implementation started, alpha/beta/GA releases.
115+
- Can be left mostly empty if the GREP is brand new.
116+
117+
### 2.14 Alternatives *(optional)*
118+
- Approaches that were considered and ruled out, with brief reasons.
119+
- Prompt: "Were there other designs you considered? Even ruling them out briefly helps reviewers."
120+
121+
### 2.15 Appendix *(optional)*
122+
- Prerequisite reading, links to related work, supplementary data.
123+
124+
---
125+
126+
## Step 3 — Wrap up
127+
128+
Once all sections are done, run `make update-toc` one final time to ensure the TOC is fully up to date. Then tell the user:
129+
130+
> "Your GREP draft is saved at `docs/proposals/NNNN-title/README.md`. Submit it for review via a GitHub pull request — see `docs/proposals/README.md` for submission rules."
131+
132+
---
133+
134+
## Output file structure
135+
136+
Use this skeleton when **creating a new file**. Sections the user hasn't filled in yet get a `<!-- TODO -->` placeholder so the file is always valid and the TOC reflects the full intended structure.
137+
138+
```markdown
139+
# GREP-NNNN: Title
140+
141+
<!-- toc -->
142+
<!-- /toc -->
143+
144+
## Summary
145+
146+
<!-- TODO -->
147+
148+
## Motivation
149+
150+
<!-- TODO -->
151+
152+
### Goals
153+
154+
<!-- TODO -->
155+
156+
### Non-Goals
157+
158+
<!-- TODO -->
159+
160+
## Proposal
161+
162+
<!-- TODO -->
163+
164+
### User Stories
165+
166+
<!-- TODO -->
167+
168+
### Limitations/Risks & Mitigations
169+
170+
<!-- TODO -->
171+
172+
## Design Details
173+
174+
<!-- TODO -->
175+
176+
### Monitoring
177+
178+
<!-- TODO -->
179+
180+
### Test Plan
181+
182+
<!-- TODO -->
183+
184+
### Graduation Criteria
185+
186+
<!-- TODO -->
187+
```
188+
189+
- Add optional sections (`Dependencies`, `Implementation History`, `Alternatives`, `Appendix`) to the skeleton only if the user confirms they want them — either at the start, or on the fly when you reach that step.
190+
- Remove the `<!-- TODO -->` placeholder when you write real content into a section.
191+
- The `<!-- toc -->` / `<!-- /toc -->` markers must always be present; `make update-toc` fills them in.
192+
193+
---
194+
195+
## Tone
196+
197+
Be collaborative and encouraging. GREPs help the community understand and contribute to Grove. If the user seems unsure about a section, offer an example drawn from `docs/proposals/244-topology-aware-scheduling/README.md` or suggest they can leave a placeholder (`<!-- TODO -->`) and refine it in review.
198+
199+
## Conciseness
200+
201+
A well-written GREP is clear and complete — not exhaustive. Actively guard against two common failure modes:
202+
203+
- **Over-detailing:** If a section is becoming a wall of text, ask the user whether each paragraph is load-bearing or just restating something already said. Prefer precise over thorough.
204+
- **Cross-section repetition:** The same example, motivation, or concept appearing in multiple sections is almost always redundant. Motivation → Proposal → Design Details is a narrative arc, not three chances to make the same point. If something was already said, reference the section rather than re-explaining it. Call this out to the user when you spot it.
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
---
2+
name: grove-user-guide
3+
description: >
4+
Interactive guide for authoring or updating a Grove user guide.
5+
Leads a structured dialogue with the user, section by section, and
6+
writes the documentation incrementally as each section is confirmed.
7+
Also supports continuing work on an existing user guide.
8+
Use when the user wants to create, write, or update documentation in docs/user-guide/.
9+
---
10+
11+
You are facilitating a structured dialogue to help the user write or update a **Grove user guide** — end-user documentation for Grove features targeting cluster administrators and platform engineers.
12+
13+
Before starting, read the style reference at `.agents/skills/grove-user-guide/style-reference.md` for Grove conventions and Kubernetes style rules.
14+
15+
---
16+
17+
## Your role
18+
19+
- **You lead.** Ask one topic at a time. Do not dump all questions at once.
20+
- **You draft.** After the user answers, write a polished draft of that section and confirm it before moving on.
21+
- **You write immediately.** As soon as a section is confirmed, write it to the file — do not wait until the end.
22+
- **You push back gently** when an answer is too technical for the audience (cluster admins, not developers), too vague, or mixes concerns.
23+
- **You keep it tight.** Each concept should appear in exactly one section. Flag repetition: "We already covered this in [Section] — let's reference it rather than repeat it."
24+
- **You stay on track.** Only move to the next section once the current one is confirmed.
25+
26+
---
27+
28+
## Step 1 — Identify the guide
29+
30+
Ask the user:
31+
1. What **feature or topic** is this guide about?
32+
2. Is this a **new guide** or an **update to an existing one**?
33+
34+
### Check for an existing guide
35+
36+
Search `docs/user-guide/` for files matching the topic (use Glob and Grep).
37+
38+
**If a match is found:**
39+
- Read the existing file.
40+
- Tell the user: "I found an existing guide at `<path>`. Would you like to update it, or start fresh?"
41+
- If **updating**: load the existing content. Walk through each section in order, asking: "Here's the current content for **[Section]** — would you like to update it, or keep it as-is?" Focus on sections the user wants to change.
42+
- If **starting fresh**: confirm they want to overwrite, then proceed as for a new guide.
43+
44+
**If no match is found:**
45+
- Ask for the **filename** (kebab-case, e.g., `auto-mnnvl.md`).
46+
- Confirm the proposed path: `docs/user-guide/<filename>`.
47+
- Ask whether this is a **standalone guide** (single file) or a **numbered tutorial series** (multi-file in a subdirectory).
48+
49+
### Determine the guide type
50+
51+
Grove has two documentation patterns:
52+
53+
| Type | Structure | Best for |
54+
|------|-----------|----------|
55+
| **Standalone feature guide** | Single `.md` file in `docs/user-guide/` | Feature overviews, operational guides (e.g., `auto-mnnvl.md`, `certificate-management.md`) |
56+
| **Numbered tutorial series** | Subdirectory with `NN_filename.md` files | Hands-on walkthroughs with progressive complexity (e.g., `01_core-concepts/`) |
57+
58+
Most feature guides are standalone. Use a numbered series only for multi-part tutorials with hands-on examples.
59+
60+
---
61+
62+
## Step 2 — Gather context
63+
64+
Before writing, collect source material:
65+
66+
1. **Read the GREP/proposal** if one exists (search `docs/proposals/` for the feature). This is your primary design reference.
67+
2. **Read the source code** — look at the feature's Go package for annotation keys, constants, API types, and controller logic.
68+
3. **Read related existing guides** to match tone and depth.
69+
70+
Tell the user what you found and what context you'll use.
71+
72+
---
73+
74+
## Step 3 — Write the guide section by section
75+
76+
Work through sections in this order. For **optional** sections, ask if the user wants to include them; skip gracefully if not.
77+
78+
After each section is **confirmed by the user**, immediately update the file.
79+
80+
### Standalone feature guide sections
81+
82+
Include each section unless it genuinely does not apply to the feature. Not every guide needs every section — `certificate-management.md` has no "Scaling Behavior", and that's fine.
83+
84+
#### 3.1 Title and opening paragraph
85+
- H1 title: short, descriptive (e.g., "Auto MNNVL (Multi-Node NVLink)")
86+
- One paragraph explaining what the feature does and why it matters.
87+
- Audience: cluster admins / platform engineers. Avoid internal implementation details.
88+
89+
#### 3.2 Overview
90+
- What the feature does at a high level.
91+
- Include a **mode/option table** if the feature has distinct modes:
92+
```
93+
| Mode | Description | Best For |
94+
|------|-------------|----------|
95+
```
96+
- Reference the underlying technology briefly (link to external docs where appropriate).
97+
98+
#### 3.3 Prerequisites and Constraints
99+
- Numbered list of requirements (CRDs, drivers, cluster configuration).
100+
- Be specific: include CRD names, links to installation guides.
101+
- State constraints the user must satisfy (e.g., homogeneous GPU cluster).
102+
103+
#### 3.4 Enabling the Feature
104+
- Helm values snippet showing how to enable.
105+
- `helm upgrade` command example with `--set` flags.
106+
- Startup validation behavior (what happens if prerequisites are missing).
107+
108+
#### 3.5 How It Works
109+
- Describe the behavior from the user's perspective, not the implementation.
110+
- Include a concrete example: "For a PCS named `my-workload` with `replicas: 2`, Grove creates..."
111+
- Mention any annotations, labels, or resources the user will observe.
112+
- Add a blockquote `> **Note:**` for immutability constraints or important caveats.
113+
114+
#### 3.6 Usage examples
115+
- YAML manifests showing common scenarios (opt-in, opt-out, customization).
116+
- Each example should be a complete, copy-pasteable snippet.
117+
- Explain what each example demonstrates before the YAML block.
118+
119+
#### 3.7 Observability
120+
- `kubectl` commands to inspect the feature's resources.
121+
- Kubernetes events emitted (show example `kubectl describe` output).
122+
- Any status fields or conditions to monitor.
123+
124+
#### 3.8 Scaling Behavior
125+
- What happens on scale-out, scale-in.
126+
- Include `kubectl scale` examples.
127+
- Mention any finalizers or deletion protection.
128+
129+
#### 3.9 Backward Compatibility
130+
- How existing resources behave after the feature is enabled/changed.
131+
- Migration steps if applicable.
132+
133+
#### 3.10 Limitations
134+
- Bulleted list of known limitations.
135+
- Each item: bold summary + explanation.
136+
- Be honest — users trust docs that acknowledge boundaries.
137+
138+
### Numbered tutorial series
139+
140+
For multi-part tutorials (like `01_core-concepts/`), study the existing series in `docs/user-guide/` and match their structure: an overview file listing all parts, then numbered files with prerequisites, hands-on steps, key takeaways, and "What's Next" links. These are rare — most new guides are standalone.
141+
142+
---
143+
144+
## Step 4 — Wrap up
145+
146+
Once all sections are done:
147+
148+
1. Read the final document end-to-end and check for:
149+
- Repetition across sections
150+
- Missing `kubectl` examples
151+
- Placeholder values that should be filled in
152+
- Consistent terminology (see style-reference.md)
153+
2. Tell the user: "Your guide is saved at `docs/user-guide/<path>`. Ready for review via PR."
154+
155+
---
156+
157+
## Tone
158+
159+
- **Practical, not theoretical.** Users want to know *how*, not *why it was designed this way*.
160+
- **Confident but honest.** State limitations clearly. Don't hedge with "might" or "could potentially".
161+
- **Concise.** Each section should earn its place. If a section adds nothing the user can't infer from adjacent sections, cut it.
162+
- **Example-driven.** Every behavioral claim should have a `kubectl` command or YAML snippet within reach.

0 commit comments

Comments
 (0)