0% found this document useful (0 votes)

4 views9 pages

Group 6-1

The document outlines key practices for monitoring and maintaining systems, including the use of performance monitoring tools, event logs, alert systems, troubleshooting methods, and patch management. It emphasizes the importance of detecting system issues early, maintaining system performance, and ensuring security through regular updates. Best practices and tools for each area are discussed to enhance system reliability and efficiency.

Uploaded by

opiojoshuara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views9 pages

Group 6-1

Uploaded by

opiojoshuara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

GROUP SIX

Monitoring and Maintaining Systems

GROUP MEMBERS

WAMBA JEBBI 124-062082-34759

KIKULUBE LATIFU 124-062082-33391
AKUSON FADIA 124-062082-35916
KIPLANANT TIMOTHY 124-062082-36569
FEISALABDULLAHI 124-062082-36881
WASWA SWAIB 124-062082-33455
MASABA ERIC 124-062082-36282
MASSA JONAH 124-062082-33980
Monitoring and Maintaining Systems
Key areas:
 System Performance monitoring tools
 Event logs and alert systems
 Troubleshooting system issues
 Patch management and update
1) System performance monitoring tools
These continuously measure system health and behavior to detect anomalies, capacity issues,
performance regressions.

They provide historical trends for capacity planning and SLA/SLO tracking.

System performance monitoring involves observing and measuring how efficiently a computer
system or server is operating.

Key Performance Indicators (KPIs)

 CPU Usage: Percentage of processor in use

 Memory Usage (RAM): Amount of memory used vs available

 Disk Performance: Disk space (free vs used), Disk read/write speed

 Network Performance

 Bandwidth usage

 Packet loss and latency

Common Monitoring Tools

 Task Manager (Windows)

 Quick overview of CPU, RAM, disk, and processes

 Performance Monitor (Perfmon)

 Advanced tool with graphs and counters

 Resource Monitor

 Detailed real-time system resource usage

 Third-party tools

 Nagios (server monitoring)

 Zabbix (network and system monitoring)

 SolarWinds (enterprise monitoring)

Importance

 Detects system bottlenecks

 Helps prevent crashes and downtime

 Improves system performance and planning

Best practices

 Define SLOs (service level objectives) and translate them into measurable metrics/alerts.

 Monitor meaningful metrics (avoid alert fatigue).

 Use dashboards for quick triage (overview + detailed drilldowns).

 Use rate and percentiles (p95/p99) for latency, not just averages.

 Retain high-resolution recent data, downsample older data for long-term trends.

 Protect monitoring system availability (redundant collectors, separate monitoring

cluster).

2) Event logs and Alert systems

Event Logs

Event logs are records of activities and events occurring in the system.

What to log: System logs, application logs, audit/security logs, infrastructure events.

Types of Event Logs in Windows

 Application Logs: Events from applications/software

 System Logs: Operating system events

 Security Logs: login attempts, access control, security events

Event Viewer

This a tool used to view and analyze logs

It helps track errors, warnings, and information messages

Alert Systems

Alert systems notify administrators when issues occur.

Types of Alerts

 Email notifications

 SMS alerts

 System pop-ups

 Automated scripts/actions

Importance

 Enables early detection of problems

 Improves security monitoring

 Supports auditing and accountability

• Alerting strategy

Two classes: Alerting for symptoms (high CPU, errors) and for underlying causes (disk full).

Define severity levels (critical, warning, info) and explicit escalation paths.

Use alert policies: avoid noisy alerts; include runbooks or playbook links in alerts.

Examples of alert rules:

▪ CPU > 90% for 5 minutes warning/critical.

▪ Error rate > 1% sustained for 2 minutes page.

▪ Disk usage > 85% warning; > 95% immediate page.

Include rate-of-change alerts (rapid growth) not just static thresholds.

• Integrations workflows
Integrate with incident management (PagerDuty, OpsGenie), chatops (Slack/MS Teams),
ticketing (Jira).

Automated remediation for common issues (auto-scaling, service restarts, scripts).

Maintain playbooks/runbooks referenced by alerts.

• Retention compliance

Set retention policies based on compliance and troubleshooting needs.

Secure logs (encryption, access controls); archive for audits.

3. Troubleshooting System Issues

Troubleshooting is the process of identifying and fixing problems in a system.

Common System Issues

 Slow system performance

 Application crashes

 Network connectivity problems

 Hardware failures

Troubleshooting Steps

 Identify the problem

 Gather information

 Analyze possible causes

 Test possible solutions

 Implement the best solution

 Monitor results
Tools for Troubleshooting
Command Prompt Tools

 ping – checks network connectivity

 ipconfig – displays IP configuration

 tracert – traces route to a destination

System Tools

 Task Manager

 Device Manager

 Safe Mode

 Event Viewer

Best Practices

 Follow a systematic approach

 Check logs for errors

 Document issues and solutions

 Always test before applying fixes

• Incident lifecycle

Detect Triage Contain Diagnose Remediate

Recover Root Cause Analysis (RCA) Prevent.

First-response checklist (quick triage)

o Gather context: when did it start? Affected services/users? Recent

deploys/changes?

o Check dashboards alerts (what metrics changed first).

o Check logs (tail, filter by correlation ID/time window).

o Check host health: top, free -m, df -h, iostat, vmstat, sar.

o Check application/service status: systemctl status, netstat/lsof for port conflicts,

process list.
o Check recent changes: deployments, config changes, patches, scaling events

Diagnostic strategies

o Reproduce (if possible) in test environment.

o Isolate the layer: network, host, container, application, DB.

o Use correlation IDs/tracing (OpenTelemetry, Jaeger) to track request path.

o Check resource exhaustion (file descriptors, threads, DB connections).

o Validate configuration (env vars, secrets, connection strings).

o Consider recent deployments or config changes as likely cause.

Post-incident

o Conduct RCA: timeline, root cause, contributing factors, remediation, follow-ups.

o Capture learnings in runbooks; update monitoring and alerts to detect earlier.

o Perform blameless postmortems.

Patch Management and updates

What is Patch Management?

This a process of updating software to fix: Bugs ,Security vulnerabilities and Performance issues.

• Goals

Keep systems secure and stable by applying security and bugfix patches in a controlled way.

Minimize downtime and risk of regressions.

• Patch management lifecycle

o Inventory: track OS, packages, firmware, dependencies.

o Prioritize: critical security CVEs first, then functional fixes.

o Test: apply patches to staging/test environments and run smoke/regression

tests.

o Schedule: define maintenance windows; consider off-peak times.

o Deploy: automated rollouts (rolling updates, canary, blue-green) to minimize
impact.

o Verify: health checks and monitoring post-patch.

o Rollback: have rollback plans and backups (snapshot images, database backups).

Tools Used

o Linux package managers: apt, yum/dnf, zypper; use unattended-upgrades

carefully.

o Configuration and orchestration: Ansible, Chef, Puppet, SaltStack for consistent

patching.

o Windows: Windows Server Update Services (WSUS), SCCM (ConfigMgr),

Windows Update for Business.

o Cloud container strategies: AMI/VM image baking with patches, immutable

images, orchestration rollouts.

o Use CI/CD pipelines to build and test new images with updated packages.

Deployment strategies to reduce risk

o Canary: update small subset, monitor, then continue.

o Rolling: update a few nodes at a time with health checks.

o Blue-green: deploy new version into green environment, switch traffic when
healthy.

o Feature flags for application-level changes.

Security considerations

o Track CVEs and security advisories; subscribe to vendor feeds.

o Prioritize kernel and remote-exploit patches.

o Combine patch management with vulnerability scanning (Nessus, OpenVAS).

o Validate cryptographic libraries and dependencies.

Governance and compliance

o Maintain patch policies (timelines for applying critical/important patches).

o Keep audit trail of patching activity and approvals.

Importance

 Protects against malware and cyber attacks

 Improves system stability

 Keeps software up to date

Best Practices

 Schedule regular updates

 Backup system before patching

 Test patches before deployment

 Keep records of updates

REFERENCES

1. "A Practical Guide to Ubuntu Linux" by Mark G. Sobell:

2. "Linux Administration: A Beginner's Guide" by Wale Soyinka:

3. Ubuntu Documentation: Offers guides on system monitoring and maintenance.

2. Red Hat Enterprise Linux Documentation:

Data Center IT Admin Playbook Guide
No ratings yet
Data Center IT Admin Playbook Guide
8 pages
System Admin Automation & Troubleshooting Guide
No ratings yet
System Admin Automation & Troubleshooting Guide
23 pages
MSP Endpoint Monitoring Playbook
No ratings yet
MSP Endpoint Monitoring Playbook
11 pages
Course Handout
No ratings yet
Course Handout
21 pages
Network Monitoring Implementation Guide
No ratings yet
Network Monitoring Implementation Guide
24 pages
System Monitoring and Maintenance Guide
No ratings yet
System Monitoring and Maintenance Guide
10 pages
Managing System Support and Security
100% (2)
Managing System Support and Security
34 pages
Simplified IT Management with Panda RMM
No ratings yet
Simplified IT Management with Panda RMM
2 pages
Essential Server Maintenance Checklist
No ratings yet
Essential Server Maintenance Checklist
2 pages
Managing Windows and Linux Servers Guide
No ratings yet
Managing Windows and Linux Servers Guide
17 pages
Graylog and sFlow Integration Guide
No ratings yet
Graylog and sFlow Integration Guide
60 pages
Windows Server Maintenance Checklist
No ratings yet
Windows Server Maintenance Checklist
12 pages
Reviewer - Prelims (AIS 153 157)
No ratings yet
Reviewer - Prelims (AIS 153 157)
10 pages
Network Monitoring and Management Guide
No ratings yet
Network Monitoring and Management Guide
36 pages
Optimize IT Operations with Freshservice
No ratings yet
Optimize IT Operations with Freshservice
20 pages
Risk Management and Incident Response Guide
No ratings yet
Risk Management and Incident Response Guide
6 pages
Service Desk Transition Checklist
No ratings yet
Service Desk Transition Checklist
7 pages
IT Support Essentials for New Technicians
No ratings yet
IT Support Essentials for New Technicians
15 pages
Managing Business Systems Effectively
No ratings yet
Managing Business Systems Effectively
21 pages
Network & System Administration Basics
No ratings yet
Network & System Administration Basics
18 pages
First Level Remote Help Desk Guide
No ratings yet
First Level Remote Help Desk Guide
88 pages
ISP Network Monitoring Workshop Overview
100% (1)
ISP Network Monitoring Workshop Overview
36 pages
Network Performance Monitoring Guide
No ratings yet
Network Performance Monitoring Guide
16 pages
IT Help Desk Evaluation Checklist
No ratings yet
IT Help Desk Evaluation Checklist
18 pages
Annexure 3 - SDC 1
No ratings yet
Annexure 3 - SDC 1
8 pages
Unit 5
No ratings yet
Unit 5
4 pages
OpManager Proof of Concept Guide
No ratings yet
OpManager Proof of Concept Guide
29 pages
System Administration Project Guidelines
No ratings yet
System Administration Project Guidelines
5 pages
NetworkandSystemAdministration StudyGuide
No ratings yet
NetworkandSystemAdministration StudyGuide
17 pages
Windows Server Management Strategies
No ratings yet
Windows Server Management Strategies
22 pages
Essential Network Security Checklist
No ratings yet
Essential Network Security Checklist
8 pages
System and Network Administration Overview
No ratings yet
System and Network Administration Overview
30 pages
IT Infrastructure Services Overview
100% (1)
IT Infrastructure Services Overview
69 pages
Oracle Grid Control 10g R2 Overview
No ratings yet
Oracle Grid Control 10g R2 Overview
41 pages
CISSP Prep CH 8. Security Operations
No ratings yet
CISSP Prep CH 8. Security Operations
112 pages
ServiceNow Official Sites Overview
No ratings yet
ServiceNow Official Sites Overview
6 pages
ISO Network Management Model Overview
100% (1)
ISO Network Management Model Overview
22 pages
Security
No ratings yet
Security
21 pages
IT Manager's 10-Step Maintenance Guide
No ratings yet
IT Manager's 10-Step Maintenance Guide
10 pages
System Administrator Best Practices Guide
No ratings yet
System Administrator Best Practices Guide
25 pages
Comprehensive IT Services Overview
No ratings yet
Comprehensive IT Services Overview
19 pages
Network Monitoring Essentials Guide
No ratings yet
Network Monitoring Essentials Guide
10 pages
Managing Network Services Effectively
No ratings yet
Managing Network Services Effectively
6 pages
Cyber Security Training Agenda for Power Sector
No ratings yet
Cyber Security Training Agenda for Power Sector
27 pages
IT Alerting and Monitoring Guide
No ratings yet
IT Alerting and Monitoring Guide
3 pages
Jade Global Infrastructure Services Overview
No ratings yet
Jade Global Infrastructure Services Overview
21 pages
Practical Guide to Server Management
No ratings yet
Practical Guide to Server Management
45 pages
Comprehensive Network Monitoring Guide
No ratings yet
Comprehensive Network Monitoring Guide
10 pages
Network Protocols and Security Overview
No ratings yet
Network Protocols and Security Overview
16 pages
MEGHANA
No ratings yet
MEGHANA
8 pages
ECE Fundamental - 61
No ratings yet
ECE Fundamental - 61
2 pages
Unit V Installation and Debugging
No ratings yet
Unit V Installation and Debugging
23 pages
IT Implementation and Migration Strategy
No ratings yet
IT Implementation and Migration Strategy
5 pages
Problem and Incident Management Overview
No ratings yet
Problem and Incident Management Overview
25 pages
ServiceDesk Plus vs JIRA: ITSM Features
No ratings yet
ServiceDesk Plus vs JIRA: ITSM Features
20 pages
Group By Clause Practice Problems
No ratings yet
Group By Clause Practice Problems
2 pages
Spreading Activation, Lexical Priming and The Semantic Web: Early Psycholinguistic Theories, Corpus Linguistics and AI Applications
100% (4)
Spreading Activation, Lexical Priming and The Semantic Web: Early Psycholinguistic Theories, Corpus Linguistics and AI Applications
15 pages
Introduction to IIS 6.0 Features
No ratings yet
Introduction to IIS 6.0 Features
11 pages
jBasic License Plate Combinations
No ratings yet
jBasic License Plate Combinations
36 pages
Data Scientist Roadmap 2025 Guide
No ratings yet
Data Scientist Roadmap 2025 Guide
5 pages
Grade IX Computer Science Assessment Framework
No ratings yet
Grade IX Computer Science Assessment Framework
31 pages
Cls Shiploc Vii Feb9 21
No ratings yet
Cls Shiploc Vii Feb9 21
2 pages
AD31 & AD32 Waterproof TDS Tester Manual
No ratings yet
AD31 & AD32 Waterproof TDS Tester Manual
2 pages
SQL Server Architecture Overview
No ratings yet
SQL Server Architecture Overview
35 pages
Electrical Load Estimation Guide
100% (2)
Electrical Load Estimation Guide
6 pages
Optimal Protein Diet Cost Analysis
No ratings yet
Optimal Protein Diet Cost Analysis
7 pages
MDF 86v188e 1
No ratings yet
MDF 86v188e 1
3 pages
Temporizador de Purga Honeywell ST7800
No ratings yet
Temporizador de Purga Honeywell ST7800
2 pages
Philips Healthcare in Chakan, Pune
No ratings yet
Philips Healthcare in Chakan, Pune
2 pages
SEWA Electrical Connection Regulations
100% (1)
SEWA Electrical Connection Regulations
2 pages
Optum India: Health Services Overview
No ratings yet
Optum India: Health Services Overview
9 pages
SCTP Configuration and Status Updates
100% (1)
SCTP Configuration and Status Updates
18 pages
Dephta Furniture Case Study Overview
No ratings yet
Dephta Furniture Case Study Overview
10 pages
Decoding Solaris Device Paths for 25K
No ratings yet
Decoding Solaris Device Paths for 25K
3 pages
Accessible Interior Design for the Visually Impaired
No ratings yet
Accessible Interior Design for the Visually Impaired
26 pages
Advancements in ImageNet and WaveNet
No ratings yet
Advancements in ImageNet and WaveNet
21 pages
CompTIA XK0-005 Exam Q&A Guide
No ratings yet
CompTIA XK0-005 Exam Q&A Guide
23 pages
Algorithmic Trading Course Overview
No ratings yet
Algorithmic Trading Course Overview
15 pages
Top 15 Linux Photo Editing Tools
No ratings yet
Top 15 Linux Photo Editing Tools
17 pages
3D Reflection Techniques in Graphics
No ratings yet
3D Reflection Techniques in Graphics
11 pages
Boolean Logic and Circuit Design Guide
No ratings yet
Boolean Logic and Circuit Design Guide
3 pages
2.5 Sqmm Cable Specifications
No ratings yet
2.5 Sqmm Cable Specifications
2 pages
Tasy EMR Server Setup Guide
No ratings yet
Tasy EMR Server Setup Guide
62 pages
SJ-20130704144811-007-ZXSDR OMMB (V12.13.30) Dynamic Data Management Operation Guide
No ratings yet
SJ-20130704144811-007-ZXSDR OMMB (V12.13.30) Dynamic Data Management Operation Guide
19 pages
Milling Circular Pockets with POCKET2
No ratings yet
Milling Circular Pockets with POCKET2
4 pages