Hi,
I had to leave job in 2018 and now restarting my career. I used to work with mainframes, could you please guide with some practical questions on mainframes, like:
If you get -811 error in production then how would you fix it
I said I would delete a row in the table
How would you decide which row to delete
I said I would delete the row with the old timestamp. How would you fix it permanently and when we can't insert duplicate rows with the same primary key, how are we getting such records in table?
This would help greatly.
Practical mainframe questions.
- Anuj Dhawan
- Founder
- Posts: 2824
- Joined: Sun Apr 21, 2013 7:40 pm
- Location: Mumbai, India
- Contact:
Re: Practical mainframe questions.
Coincidentally, I have been compiling a list of similar questions. Please review the following to see if they are helpful:
1. Handling -811 SQL Error in Production
Question: If you get a -811 error in production, how would you fix it?
Answer:
A -811 SQL error occurs when a query that is expected to return a single row returns more than one row. This usually happens when the SQL query is written without the proper conditions for uniqueness, such as missing WHERE or GROUP BY clauses.
Steps to Fix:
Answer:
The most common reason is that while the primary key constraint prevents exact duplicates, business logic or a lack of proper validation in the application might allow rows with different combinations of non-key columns to be inserted. You need to review the insert logic and ensure it's correctly enforcing unique business rules.2. Difference Between Static and Dynamic Calls
Question: If program A is calling program B, what all things need to be done? How would you explain the difference between a static call and a dynamic call to the business, and which call is best to use for processing 2 billion records?
Answer:
Question: If a SOC7 error occurs in production, how would you identify and resolve it?
Answer:
A SOC7 abend typically occurs due to invalid numeric data in a computational field.
Steps to Resolve:
Question: If there's an error in the 9th record of the 1,000,000th column, how would you identify and fix it?
Answer:
To handle data errors in large files, follow these steps:
Question: If a program works fine in the test environment but fails in production, how would you debug the issue?
Answer:
Question: If you have a long-running batch job that is nearing its time limit in production, how would you handle the situation?
Answer:
Question: If a DB2 program is encountering deadlocks in production, how would you resolve it?
Answer:
Question: How would you identify and resolve memory leaks in a COBOL program running on a mainframe?
Answer:
Question: How would you ensure data integrity when processing a large batch file in COBOL?
Answer:
1. Handling -811 SQL Error in Production
Question: If you get a -811 error in production, how would you fix it?
Answer:
A -811 SQL error occurs when a query that is expected to return a single row returns more than one row. This usually happens when the SQL query is written without the proper conditions for uniqueness, such as missing WHERE or GROUP BY clauses.
Steps to Fix:
- Identify the Duplicate Rows: Check which query caused the error and which table has duplicate rows. This can be done using a SELECT query with conditions to find the duplicates.
- Investigate Data: Review the data in the table to understand why there are duplicates. Are there rows with the same primary key or unique identifier?
- Decide Which Row to Delete: To resolve this temporarily, you could delete the row with the older timestamp if the application tracks timestamps. For example:
Code: Select all
DELETE FROM table_nameWHERE timestamp_column < 'desired_timestamp'
- Permanent Fix: To permanently fix this issue, you must ensure that the query fetching the data is written to expect multiple rows when needed. You should also verify that proper constraints (like unique keys or primary keys) are applied to the database to prevent such issues in the future.
Answer:
The most common reason is that while the primary key constraint prevents exact duplicates, business logic or a lack of proper validation in the application might allow rows with different combinations of non-key columns to be inserted. You need to review the insert logic and ensure it's correctly enforcing unique business rules.2. Difference Between Static and Dynamic Calls
Question: If program A is calling program B, what all things need to be done? How would you explain the difference between a static call and a dynamic call to the business, and which call is best to use for processing 2 billion records?
Answer:
- Static Call:
- In a static call, the called program (Program B) is included in the load module of the calling program (Program A) at compile time.
- Advantages: Faster execution as the linkage is resolved at compile time.
- Use Case: Best when the called program doesn't change frequently, and performance is critical (e.g., processing a large number of records like 2 billion).
- Dynamic Call:
- In a dynamic call, the called program is loaded into memory at runtime.
- Advantages: Easier to maintain, as changes to Program B don’t require recompilation of Program A.
- Use Case: Best when the called program is expected to change often or when different versions of the program need to be used dynamically.
- For a scenario involving the processing of 2 billion records, a static call would generally be preferred because the performance benefits from the pre-linking would outweigh the flexibility provided by dynamic calls.
Question: If a SOC7 error occurs in production, how would you identify and resolve it?
Answer:
A SOC7 abend typically occurs due to invalid numeric data in a computational field.
Steps to Resolve:
- Identify the Failing Statement: Check the abend log or dump. Use tools like ABEND-AID or IBM Debug Tool to identify the line of code where the error occurred.
- Locate the Invalid Data: Find the specific record causing the issue. Tools like FAULT Analyzer can show you the exact data. If you're told it's the 9th record of the 1,000,000th column, you can:
- Use a display statement to output the values of fields before the failing statement.
- Use dump tools to examine the hexadecimal representation of the data and identify invalid numeric values.
- Fix the Data: Once the offending data is identified, you can correct it either in the input file or adjust the code to handle invalid data properly (e.g., using NUMVAL or NUMCHECK functions).
- Re-run the Job: After correcting the data or adjusting the logic, re-run the job to ensure the error is resolved.
Question: If there's an error in the 9th record of the 1,000,000th column, how would you identify and fix it?
Answer:
To handle data errors in large files, follow these steps:
- Isolate the Record: If the error message indicates the 9th record of the 1,000,000th column, you can write a utility program (using COBOL or a tool like SORT) to isolate that specific record from the large file for analysis:
Code: Select all
OPEN INPUT INFILEPERFORM UNTIL EOF
READ INFILE AT END
MOVE 'Y' TO EOF-FLAG
NOT AT END
IF RECORD-COUNT = 1000000 THEN
DISPLAY "ERROR RECORD" RECORD-DATA
END-IF
ADD 1 TO RECORD-COUNT
END-READEND-
PERFORMCLOSE INFILE.
- Analyze the Data: Once the specific record is identified, check its contents to determine what the issue is (e.g., invalid numeric value, incorrect format).
- Correct the Data: You can manually correct the data or adjust the code to better handle such records in the future (e.g., adding validation).
Question: If a program works fine in the test environment but fails in production, how would you debug the issue?
Answer:
- Check for Environmental Differences:
- The first step is to check if there are any differences in the test and production environments, such as dataset names, file structures, DB2 region configurations, or CICS transaction configurations.
- Ensure that the JCL, input data, and program versions are consistent across both environments.
- Review the Abend Logs or Messages:
- Examine the abend codes, system logs, or error messages generated in production. This will give you insights into what caused the program to fail.
- Use tools like ABEND-AID or FAULT Analyzer to extract detailed information from the dump.
- Check Resource Availability:
- Verify if production resources (datasets, files, DB2 tables, etc.) are available and properly authorized for the program.
- Ensure that production datasets are properly allocated, and there are no space constraints or missing permissions.
- Compare Input Data:
- Verify that the input data in production is consistent with the input used in the test environment. Data differences can cause unexpected behavior, especially in cases where data is incomplete or corrupted.
- Insert Debugging Statements:
- Add additional DISPLAY or logging statements in the program to track the data flow, especially around areas where you suspect the error is occurring. This will help in understanding where the issue lies in the code.
Question: If you have a long-running batch job that is nearing its time limit in production, how would you handle the situation?
Answer:
- Identify the Performance Bottleneck:
- Use performance monitoring tools like IBM OMEGAMON or MainView to check where the batch job is spending the most time.
- Look for I/O bottlenecks, excessive database locks, or large amounts of data being processed inefficiently.
- Optimize the Code:
- Check for any inefficient loops, unnecessary reads, or excessive database calls in the COBOL program.
- Review the SQL queries for inefficiencies, such as missing indexes or full table scans, and optimize them accordingly.
- Split the Job:
- If the job is processing a large volume of records, you can split it into smaller, parallel-running jobs. This can be done by dividing the input dataset and processing different parts in separate job steps or programs.
- For example, processing data in chunks of 100,000 records at a time can reduce the load on each run.
- Increase the Time Limit:
- If possible, increase the time limit for the job by adjusting the TIME parameter in the JCL:
Code: Select all
//STEP1 EXEC PGM=MYPROG, TIME=1440
- Checkpoint/Restart Logic:
- Implement checkpoint/restart logic in the COBOL program so that if the job exceeds the time limit, it can resume processing from the last checkpoint instead of starting over.
Question: If a DB2 program is encountering deadlocks in production, how would you resolve it?
Answer:
- Identify the Deadlock:
- Review the DB2 logs and use DB2 performance monitoring tools like DB2PE or OMEGAMON for DB2 to identify the deadlocked transactions.
- Check the SQLCODE for the deadlock, which is typically SQLCODE -911 or SQLCODE -913.
- Check Locking Strategy:
- Review the locking strategy in the application. Ensure that the program is not using excessive locking (such as WITH HOLD cursors or LOCK TABLE statements) that could be causing contention between transactions.
- Consider using row-level locking instead of table-level locking if the contention is due to multiple processes updating rows in the same table.
- Rearrange the SQL Execution Order:
- Change the order of SQL execution in the programs. Ensure that all programs accessing the same resources do so in the same order to prevent circular waits, which cause deadlocks.
- Reduce Transaction Scope:
- Reduce the scope of each transaction by committing smaller units of work more frequently. This reduces the amount of time locks are held, decreasing the likelihood of deadlocks.
- Use Deadlock Timeout Parameter:
- Set a deadlock timeout to handle deadlocks automatically. This allows one transaction to be rolled back when a deadlock occurs, allowing the other transaction to proceed.
Question: How would you identify and resolve memory leaks in a COBOL program running on a mainframe?
Answer:
- Check for Dynamic Memory Allocation:
- COBOL programs often allocate dynamic memory using the CALL statement. If memory is not properly deallocated after use, it can cause memory leaks.
- Ensure that any dynamically allocated memory is properly released using the FREE statement:
- [/code]
Code: Select all
CALL 'CEEFRST' USING pointer-variable[code]
- Use Mainframe Tools:
- Use tools like IBM Fault Analyzer or CA InterTest to monitor memory usage. These tools can help detect memory leaks and provide details about which areas of the program are consuming excessive memory.
- Optimize Working-Storage:
- Review the WORKING-STORAGE SECTION to ensure that large arrays or variables are not consuming unnecessary memory throughout the entire program execution. Move large data structures to the LOCAL-STORAGE SECTION so they are allocated and deallocated with each procedure call.
- Test with Smaller Input:
- Test the program with smaller input data sets and monitor the memory usage over time. If memory usage continuously increases during execution, there may be an issue with memory being allocated but not released.
- Check for Recursion:
- If the program uses recursion (through PERFORM or CALL statements), ensure that it properly returns and deallocates resources after each recursion.
Question: How would you ensure data integrity when processing a large batch file in COBOL?
Answer:
- Check for Duplicate Records:
- Before processing the file, run a check for duplicate records, especially if the file contains primary key values or unique identifiers.
- You can use a COBOL program or SORT utility to identify and remove duplicates before processing.
- Use Control Totals:
- Implement control totals to ensure data consistency. For example, count the number of records processed and compare it with the total number of records in the input file.
- Calculate totals for key numeric fields and verify them against expected values (e.g., summing transaction amounts).
- Checkpoint/Restart:
- Implement checkpoint/restart logic in long-running batch programs to ensure that in case of a failure, the job can restart from the last successful checkpoint without duplicating work or skipping records.
- Data Validation:
- Perform validation checks on the input data before processing. For example, ensure that numeric fields contain valid numeric values, date fields are valid, and mandatory fields are not null.
- Error Handling:
- Implement robust error-handling routines to trap any unexpected data errors. Write error records to a separate file for investigation without disrupting the main processing flow.
Thanks,
Anuj
Disclaimer: My comments on this website are my own and do not represent the opinions or suggestions of any other person or business entity, in any way.
Anuj
Disclaimer: My comments on this website are my own and do not represent the opinions or suggestions of any other person or business entity, in any way.
-
- Registered Member
- Posts: 16
- Joined: Tue Jan 20, 2015 8:40 am
Create an account or sign in to join the discussion
You need to be a member in order to post a reply
Create an account
Not a member? register to join our community
Members can start their own topics & subscribe to topics
It’s free and only takes a minute