资源

Resource usage

The LSF system uses built-in and configured resources to track resource availability and usage. Jobs are scheduled according to the resources available on individual hosts.

Jobs that are submitted through the LSF system will have the resources that they use monitored while they are running. This information is used to enforce resource limits and load thresholds as well as fairshare scheduling.

LSF collects information such as:

  • Total CPU time consumed by all processes in the job

  • Total resident memory usage in KB of all currently running processes in a job

  • Total virtual memory usage in KB of all currently running processes in a job

  • Currently active process group ID in a job

  • Currently active processes in a job

    On UNIX, job-level resource usage is collected through PIM.

Commands

  • lsinfo — View the resources available in your cluster

  • bjobs -l — View current resource usage of a job

Configuration

  • SBD_SLEEP_TIME in lsb.params — Configures how often resource usage information is sampled by PIM, collected by sbatchd, and sent to mbatchd

Load indices

Load indices measure the availability of dynamic, non-shared resources on hosts in the cluster. Load indices that are built into the LIM are updated at fixed time intervals.

Commands

  • lsload -l — View all load indices

  • bhosts -l — View load levels on a host

External load indices

Defined and configured by the LSF administrator and collected by an External Load Information Manager (ELIM) program. The ELIM also updates LIM when new values are received.

Commands

  • lsinfo — View external load indices

Static resources

Built-in resources that represent host information that does not change over time, such as the maximum RAM available to user processes or the number of processors in a machine. Most static resources are determined by the LIM at startup.

Static resources can be used to select appropriate hosts for particular jobs based on binary architecture, relative CPU speed, and system configuration.

Load thresholds

Two types of load thresholds can be configured by your LSF administrator to schedule jobs in queues. Each load threshold specifies a load index value:

  • loadSched determines the load condition for dispatching pending jobs. If a host’s load is beyond any defined loadSched, a job will not be started on the host. This threshold is also used as the condition for resuming suspended jobs.

  • loadStop determines when running jobs should be suspended.

    To schedule a job on a host, the load levels on that host must satisfy both the thresholds that are configured for that host and the thresholds for the queue from which the job is being dispatched.

    The value of a load index may either increase or decrease with load, depending on the meaning of the specific load index. Therefore, when comparing the host load conditions with the threshold values, you need to use either greater than (>) or less than (<), depending on the load index.

Commands

  • bhosts -l — View suspending conditions for hosts

  • bqueues -l — View suspending conditions for queues

  • bjobs -l — View suspending conditions for a particular job and the scheduling thresholds that control when a job is resumed

Configuration

  • lsb.hosts — Configure thresholds for hosts

  • lsb.queues — Configure thresholds for queues

Runtime resource usage limits

Limit the use of resources while a job is running. Jobs that consume more than the specified amount of a resource are signaled or have their priority lowered.

Configuration

  • lsb.queues — Configure resource usage limits for queues

Hard and soft limits

Resource limits that are specified at the queue level are hard limits while those specified with job submission are soft limits. See setrlimit(2) man page for concepts of hard and soft limits.

Resource allocation limits

Restrict the amount of a given resource that must be available during job scheduling for different classes of jobs to start, and which resource consumers the limits apply to. If all of the resource has been consumed, no more jobs can be started until some of the resource is released.

Configuration

  • lsb.resources — Configure queue-level resource allocation limits for hosts, users, queues, and projects

Resource requirements (bsub -R)

Restrict which hosts the job can run on. Hosts that match the resource requirements are the candidate hosts. When LSF schedules a job, it collects the load index values of all the candidate hosts and compares them to the scheduling conditions. Jobs are only dispatched to a host if all load values are within the scheduling thresholds.

Commands

  • bsub -R — Specify resource requirement string for a job

Configuration

  • lsb.queues — Configure resource requirements for queues

最后更新于