PCP 4.0 Plans


This document outlines the major pieces of work that are being considered for the PCP 4.0 release.

This is Version 1.4 of this document.

Changes since Version 1.3 are shown with a blue revision bar to the right of a paragraph or point that has been added, amended or deleted.

Plans are very fluid and we'd encourage debate and discussion on items to be added to or removed from this list. Of course arguments in support of work items are much stronger if the advocate is also volunteering to do the work and augment the QA coverage.

As part of the investigations and to develop some "proof of concept" confidence, items marked Done are completed in the "pcp4" branch of Ken's local git tree and will be pushed to oss.sgi.com over time.

Useful Components Based on the New Event Record Services

We're looking to develop a new PMDA that is both useful and a working demonstration of the Event Record features added in PCP 3.5.

This would include:

One initial idea is a PMDA that uses ptrace() to collect system call event traces and system call stats for one process per PMAPI client of the PMDA.

Develop a general dumper for a stream of event records, akin to pmval. The new pmevent application serves this purpose. Done

A related piece of work is to generalize the original event record support so that a single metric may have multiple instances, each of which has a value that is a packed array of event records (just like all other types of PCP metrics). This involves an external API change for pmUnpackEventRecords() to specify which instance to unpack and some consequential API changes for internal routines. Done

Also, remove the macro PM_CLUSTER_EVENT as there is no longer a need to reserve a special value for the cluster field of a PMID for an event type metric. Done

Optional Client Authentication

There are several places, e.g. the proc metrics, pmlogger control via pmlc, the new ptrace PMDA, etc. where it would be advantageous if we had a more sophisticated authentication mechanism than the current coarse-grained access control mechanism based on client IP address.

The details are TBD, but we're sure we should embrace an existing authentication infrastructure, not create a new one.

There has been one vote in favour of supporting Kerberos authentication.

PCP Archive Format and pmlogger Changes

Remove the ability to create Version 1 PCP archives.

Drop support for reading Version 1 PCP archives. The first open source PCP release was 2.1.1 on 7 Dec 1999, so Version 1 archives have never been the default in the open source release and were only ever the default for the 1.x releases of PCP within SGI and IRIX. Dropping all support for Version 1 PCP archives would clean up a lot of code internally, and we suspect would not be noticed outside the atrophied corners of PCP QA where Version 1 archives are still exercised (but even this use has been retired with a QA tool to convert archives from Version 1 to Version 2).

On a related note, both pmlogextract and pmlogreduce need to lose the ability to created Version 1 archives.

Automate volume switching before the data volume reaches 2^31 bytes. The temporal index for PCP archives will only work if the offset within a data volume is less than the maximum precision of a 32-bit signed number.

There appears to be a related issue in pmlogextract when combining more than one archive, the output archive data volume could exceed the maximum – the same automated volume switching logic is required here.

The other log writing tool, pmlogreduce, is also exposed to this issue and needs a similar fix.

Remove version 1 of the protocol between pmlc and pmlogger, so the LOG_PDU_VERSION1 macro and the DATA_X PDU.

All of the items in this section are Done.

Make libraries Thread-Safe

Definitely a "stretch" goal, but now Done for libpcp which is the only library under consideration to be made thread-safe at this stage.

With Greg's hit list as a starting point the following general changes have been made:

The build for libpcp includes a script and control file that will cause the build to fail if any changes are made to the static data symbols in the library.

Retire _pmPool* Routines

The original justification for __pmPoolAlloc, __pmPoolFree and __pmPoolCount has long since gone (very few will even know the history here, and I'll wager Akmal Khan no longer cares). Modern malloc libraries will do the job and we can remove lots of annoying special case handling of memory allocations and releases. Done

Remove all the Asynchronous Routines in libpcp

The asynchronous variants of the core libpcp routines are not used as far as we know, and are definitely not tested in the PCP QA suite.

The following routines (and any static routines they alone use) would be removed from the library: pmContextConnectChangeState, pmRequestStore (bogus in pmapi.h, not in the source code) pmContextConnectTo, pmContextUndef, __pmGetBusyHostContextByID, pmGetContextFD, pmGetContextTimeout, __pmGetHostContextByID, pmLoopMain, pmLoopRegisterChild, pmLoopRegisterIdle, pmLoopRegisterInput, pmLoopRegisterSignal, pmLoopRegisterTimeout, pmLoopStop, pmLoopUnregisterChild, pmLoopUnregisterIdle, pmLoopUnregisterInput, pmLoopUnregisterSignal, pmLoopUnregisterTimeout, pmReceiveDesc, pmReceiveFetch, pmReceiveInDom, pmReceiveInDomInst, pmReceiveInDomName, pmReceiveNameID, pmReceiveNames, pmReceiveNamesAll, pmReceiveNamesOfChildren, pmReceiveStore (bogus in pmapi.h, not in the source code) pmReceiveText, pmReceiveTraversePMNS, pmRequestDesc, pmRequestFetch, pmRequestInDom, pmRequestInDomInst, pmRequestInDomName, pmRequestInDomText, pmRequestNameID, pmRequestNames, pmRequestNamesOfChildren, pmRequestStore (bogus in pmapi.h, not in the source code) pmRequestText, pmRequestTraversePMNS, pmStoreCheck and pmStoreSend.

And the associated error codes will be retired, namely PM_ERR_ISCONN, PM_ERR_NEEDPORT, PM_ERR_WANTACK and PM_ERR_CTXBUSY. Note that PM_ERR_NOTCONN will be retained because it is also used outside the context of the asynchronous routines, and that these changes will involve removing some of the error code mappings in the Perl parts of the source code.


We no longer need the pmcd_ctl_state_t struct, nor the members pc_curpdu, pc_fdflags and pc_state in the __pmPMCDCtl structure. This allows the following related macros to be retired: PC_FETAL, PC_CONN_INPROGRESS, PC_WAIT_FOR_PMCD and PC_READY.

There was a suggestion of bid to maintain these services based on additional contributions to PCP that would provide a real-world consumer and QA coverage for the asynchronous services, but this did not happen, so all of the code related to asynchronous APIs has now been removed from the code base.


Retire Version 1 PCP PMAPI

We've been using Version 2 of PCP PMAPI for years, but there are still some remnants of the old Version 1 PMAPI remaining in the code base. While it is not possible to remove all the Version 1 uses internally (see notes below), the plan is to cut this to the absolute minimum to reduce baggage and obfuscation in the source code.

Specifically, this means:

All of the items in this section are Done.

Retire PMDA_INTERFACE_1 for libpcp_pmda

There are no PMDAs using PMDA_INTERFACE_1 now, and the support in the code base adds obfuscation and is not tested in PCP QA.

This change also removes the one structure from the version union of the pmdaInterface structure. And the HAVE_V_ONE macro from the libpcp_pmda source. Code changes are also required in libpcp for PM_CONTEXT_LOCAL, pmcd and dbpmda. Done

PMNS Changes

Make Version 2 the Default for a Binary PMNS

We've been using the Version 1 format for binary PMNS files (as created by pmnscomp) for a long time, so Version 0 can be retired especially since Version 0 is neither endian safe nor word size safe! Version 2 has been supported for at least 4 years, and is similar to Version 1 with the added protection of some internal integrity checking via checksumming. Henceforth, pmnscomp will generate a Version 2 binary PMNS by default (–v 1 is still supported on the command line), and libpcp would continue to provide support for reading binary PMNS files in either Version 1 and Version 2 formats, but not Version 0 format. Done

Remove the Binary PMNS

Subsequent to the "Version 2 is the default" change described above, it was suggested that the binary format for the PMNS is no longer needed, as the PMNS is loaded so infrequently these days (typically once by pmcd at startup and following the installation or removal of a PMDA), that we could drop the binary format altogether.

This would mean turning pmnscomp into a something that simply exits successfully (for any legacy script use), audit the places where pmnscomp is used in the PCP code and take the opportunity there to remove any old binary PMNS files found, and remove all the binary PMNS support behind pmLoadNameSpace().

All the work in this section is now Done.

Remove the dependence on cpp

For some production deployments it is annoying to require cpp to be installed to process the ASCII PMNS (in some packing environments this may mean the whole C compiler toolchain needs to be installed). A new light-weight pre-processor, pmcpp has been proposed, and was first implemented in PCP 3.5.3 and forward ported to PCP 4.0. Done

Rework "init" Scripts

Originally there was no pmlogger control services and the only place pmlogger could sensibly run was on a host where pmcd was also running. Which is why pmcd and pmlogger share the one "init" script. Now that pmlogger may be running with no local pmcd running, it would be more flexible to separate these two services so each has their own self-contained script. For the system start operations, pmcd's script must run first. For the system stop operations, pmlogger's script must be run first.

Items Awaiting Investigation

Deeper code coverage analysis to improve the QA and identify code changes to improve quality in the core PCP components.

Ken McDonell
Feb 2012