For anyone who has found themselves staring blankly at the seemingly random errors returned during a CMDB sync, here are some examples, explanations and possible resolutions. I’ll keep adding to this post as i come across new error codes. If anyone else has comments, suggestions or errors, feel free to reply to this post.
ERROR (120149):
Example:
com.tideway.integrations.cmdbsync.exception.CMDBAccessException:
An error occurred committing the Atrium transaction.
The transaction has been rolled back.ERROR (120149): ; Size: 0, Type: calloc, Source File: cmdbsvc.cpp, Line Number: 3862
at com.tideway.integrations.cmdbsync.provider.atrium76.Atrium76Connector.commit(Atrium76Connector.java:151)
Explanation: This error doesn’t appear to mean anything (functionally) at all. It may be a result of ADDM not getting a quick enough response for the success of the sync, but i can’t say for sure. What i can say is that the records in CMDB will have been updated.
With further testing, this error appears ONLY when updating CI’s in CMDB. New records do not cause this error.
Response from BMC suggests this to be an issue with running CMDB on AIX machines, for which there is a hotfix. It’s odd that this wasn’t freely available on this site before, so i’m doing it now. I would suggest that you consult with BMC before implementing any changes on ADDM, but for those of you who confront problems with reckless abandon, files are attached (ISS03777194).
Resolution: Hotfix, attached
ERROR (10000):
Example:
com.tideway.integrations.cmdbsync.exception.CMDBAccessException:
An error occurred committing the Atrium transaction.
The transaction has been rolled back.ERROR (9713): Bulk entry transaction has failed due to an error encountered during an individual operation;
Operation 16 in bulk transaction failed
ERROR (10000): ; There already exists a Computer System CI with this name within the current dataset. Please use a unique name.
Explanation: This error is a result of CMDB containing data for a previous incarnation of the “same” host. The hostname is said to be a match in CMDB, but the key will be different. CMDB requires that both the key and the name of the ComputerSystem any CI be unique.
Resolution: One way to fix this is to delete the CI record of the old host from CMDB, then sync the new host. This however may cause problems further down the line if the data from the BMC.ADDM dataset is being normalized with the BMC.ASSET dataset, so be aware. The other is to wait and let the old host age out; taking the CMDB record with it (if that’s how the CMDB is set up). Other options are modifying the name of the CI to something like “20110809_old_%hostname%”. This way does allow you to maintain the record and it’s relationships in CMDB, but only if the manual records are updated to match.
ERROR (566):
Example:
An error occurred committing the Atrium transaction.
The transaction has been rolled back.ERROR (9713): Bulk entry transaction has failed due to an error encountered during an individual operation;
Operation 16 in bulk transaction failed
ERROR (566): Deadlock during SQL operation to the database;
Explanation: This error is most commonly a result of CMDB table rows being locked during a failed transactional process and attempting to submit data to that or a related row.
Resolution: It is possible that the failing transaction which led to this error will eventually resolve itself, but i would suggest checking for long running sessions which would relate to the transaction. This should allow you to identify and kill the necessary session.
Further analysis seems to indicate that if too many (4-5+) of the errors occur (at or around the same time), it will simply grind the CMDB to a complete halt. A reboot of both the ADDM appliance and the CMDB server will be required at that point.
The only current fix is to set the number of Sync Threads to a value of 1 (instead of 8). If only one transaction is running at one time it cannot possibly interfere with another.
ERROR (120040) / (9713):
Example:
An error occurred committing the Atrium transaction.
The transaction has been rolled back.ERROR (9713):
Bulk entry transaction has failed due to an error encountered during an individual operation; Operation 7 in bulk transaction failed
ERROR (120040): The relationship endpoint instance does not exist.;
Rel(clsId:BMC.CORE:BMC_INIPSUBNET instId: OI-00ca9e2dbe5c4de08f2d55ff80ad3ffb can't be created. L-endpoint inst does not exist -- , instId: OI-f48554ed464741be951cccd20d629225
Explanation: I’m not sure i have an explanation for these errors, other that an unexpected behaviour of the CMDB database. The error suggests that a relationship cannot be created because one of the CI’s does not exist. This is very odd since the CDM pattern creates both CI’s before the relationship. Even the traversals in the CDM pattern dictate this order of action. The odd thing about this error is that it refers to a ‘L-endpoint’ (which i assume stands for BMC_LANENDPOINT) and a ‘BMC_INIPSUBNET’ relationship. But there is no relationship between these CI classes. BMC_INIPSUBNET relates to BMC_IPENDPOINT, BMC_IPENDPOINT relates to BMC_LANENDPOINT.
Resolution: Looking at the data in CMDB, all relationships (belonging to these CI’s) were created/modified during the time of this error. I can only surmise that this is similar to the 120149 error, and should be checked initially, then ignored (if all looks ok).
