使用 Python 脚本监控进程的 CPU 和内存使用

在日常开发和运维工作中，我们常常需要监控某个进程的资源使用情况，例如 CPU 和内存的占用。如果仅靠手动运行 top 或其他工具，不仅效率低下，还难以长时间记录数据。这篇文章指导如何编写一个高效的脚本，自动监控指定进程的资源使用情况，并将结果保存到日志中。

功能描述

这个脚本实现以下功能：

接收两个输入参数：
- 进程 ID (int)：要监控的目标进程的 ID。
- 日志文件路径 (str)：记录监控数据的日志文件路径。
每秒执行一次 top 命令，获取目标进程的 CPU 和内存使用情况。
将结果以时间戳、CPU 使用百分比、内存使用百分比的格式写入指定日志文件中。
如果目标进程不存在，则自动停止监控并退出。

脚本实现

下面是完整的 Python 脚本代码：

import argparse
import subprocess
import time
import os

def monitor_process(pid: int, log_path: str):
    """Monitor the memory and CPU usage of a process and log it to a file."""
    # Check if the process exists
    if not os.path.exists(f"/proc/{pid}"):
        print(f"Process with PID {pid} does not exist.")
        return

    try:
        with open(log_path, 'a') as log_file:
            log_file.write(f"{'Timestamp':<20} {'CPU(%)':<10} {'Memory(%)':<10}\n")  # Add header to the log file

            while True:
                if not os.path.exists(f"/proc/{pid}"):
                    print(f"Process with PID {pid} has exited. Stopping monitoring.")
                    break

                try:
                    # Execute `top` command and filter for the specific PID
                    result = subprocess.run(
                        ["top", "-b", "-n", "1", "-p", str(pid)],
                        stdout=subprocess.PIPE,
                        stderr=subprocess.PIPE,
                        text=True
                    )

                    if result.returncode != 0:
                        print(f"Error executing top command: {result.stderr.strip()}")
                        break

                    # Parse the output for the process stats
                    for line in result.stdout.splitlines():
                        if line.strip().startswith(str(pid)):
                            columns = line.split()
                            if len(columns) > 9:
                                cpu = columns[8]  # CPU usage
                                mem = columns[9]  # Memory usage
                                timestamp = time.strftime("%Y-%m-%d %H:%M:%S")
                                log_file.write(f"{timestamp:<20} {cpu:<10} {mem:<10}\n")
                                log_file.flush()
                                print(f"Logged: {timestamp} - CPU: {cpu}% Memory: {mem}%")
                            break

                except KeyboardInterrupt:
                    print("Monitoring stopped by user.")
                    break

                time.sleep(1)  # Wait for 1 second before the next iteration

    except IOError as e:
        print(f"Failed to write to log file {log_path}: {e}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Monitor process CPU and memory usage.")
    parser.add_argument("pid", type=int, help="Process ID to monitor.")
    parser.add_argument("log_path", type=str, help="Path to the log file.")
    args = parser.parse_args()

    monitor_process(args.pid, args.log_path)

代码说明

参数解析

脚本使用 argparse 模块解析两个参数：

pid：进程 ID。
log_path：日志文件路径，用于保存监控结果。

进程存在性检查

脚本通过检查 /proc/{pid} 路径判断目标进程是否存在。如果进程不存在，脚本会打印提示并退出。

`top` 命令执行

使用 subprocess.run 调用 top 命令，以批处理模式（-b）获取指定进程（-p）的资源使用信息。

日志格式

日志文件的格式如下：

Timestamp            CPU(%)     Memory(%)  
2024-12-08 10:00:00  1.5        0.2        
2024-12-08 10:00:01  2.0        0.3        

Timestamp：时间戳，记录数据采集的时间。
CPU(%)：CPU 使用百分比。
Memory(%)：内存使用百分比。

自动退出机制

脚本在每次循环时检查 /proc/{pid} 是否存在，如果目标进程已退出，则停止监控并打印提示信息。

键盘中断处理

支持用户通过 Ctrl+C 中断脚本执行，并安全退出。

使用方法

将上述代码保存为 monitor_process.py。
在终端运行脚本，指定进程 ID 和日志文件路径。例如：

python monitor_process.py 12345 /path/to/logfile.log

脚本将每秒记录一次目标进程的 CPU 和内存使用情况，并将结果保存到指定日志文件中。

应用场景

性能调优：实时监控关键进程的资源使用情况，帮助发现性能瓶颈。
故障排查：记录进程资源历史，便于分析问题根源。
自动化监控：结合其他工具或脚本，实现全自动化的监控与报警。

注意事项

权限问题：确保脚本有权限访问目标进程及写入日志文件。
进程状态检查：目标进程必须存在，否则脚本将无法运行。
性能开销：top 命令会占用少量系统资源，在频繁调用时需评估其影响。

功能描述​

脚本实现​

代码说明​

参数解析​

进程存在性检查​

top 命令执行​

日志格式​

自动退出机制​

键盘中断处理​

使用方法​

应用场景​

注意事项​