TOP SQL Interview Queries

- SELECT EMP_ID, NAME FROM EMPLOYEE_TBL WHERE EMP_ID = '0000' ;
- SELECT EMP_ID, LAST_NAME FROM EMPLOYEE
- WHERE CITY = 'Seattle' ORDER BY EMP_ID;
- SELECT COUNT (CustomerID), Country FROM Customers GROUP BY Country;
- SELECT SUM (Salary)FROM Employee WHERE Emp_Age < 30;
- SELECT AVG (Price)FROM Products;
- SELECT * FROM My_Schema.views;
- CREATE VIEW Failing_Students AS
- SELECT S_NAME, Student_ID
- FROM STUDENT
- WHERE GPA > 40;
- SELECT * FROM Failing_Students;
- CREATE OR REPLACE VIEW [ Product List] AS
- SELECT ProductID, ProductName, Category
- FROM Products
- WHERE Discontinued = No ;
- DROP VIEW V1;
- SELECT * FROM Sys.objects WHERE Type= ’ u ’
- SELECT * from Sys.Objects WHERE Type= 'PK'
- SELECT * FROM Sys.Objects WHERE Type= 'uq'
- SELECT * FROM Sys.Objects WHERE Type= 'f '
- SELECT * FROM Sys.Objects WHERE Type= 'tr'
- SELECT * FROM Sys.Objects WHERE Type= 'it'
- SELECT * FROM Sys.Objects WHERE Type= 'p'
- UPDATE Customers SET Zip=Phone, Phone=Zip
- SELECT DISTINCT ID FROM Customers
- SELECT TOP 25 FROM Customers WHERE Customer_ID<>NULL;
- SELECT * From Customers WHERE Name LIKE 'Herb%'
- SELECT ID FROM Customers INNER
- JOIN Orders ON Customers.ID = Orders.ID
- SELECT phone FROM Customers
- UNION SELECT item FROM Orders
- SELECT Item AS item_description FROM Orders
- SELECT Item FROM Orders
- WHERE id = ALL
- (SELECT ID FROM Orders
- WHERE quantity > 50)
-
/* This query below is commented so it won't execute*/
-
/*
-
SELECT item FROM Orders
-
WHERE date ALL = (SELECT Order_ID FROM Orders
-
WHERE quantity > 50)
-
*/
-
/* the SQL query below the will be executed
-
ignoring the text after "--"
-
*/
- SELECT item -- single comment
-
FROM Orders -- another single comment
- WHERE id
- ALL = (SELECT ID FROM Orders
- WHERE quantity > 25)
- CREATE DATABASE AllSales
- CREATE TABLE Customers (
- ID varchar (80),
- Name varchar (80),
- Phone varchar (20),
- ....
- );
- ALTER TABLE Customers ADD Birthday varchar(80)
- DROP TABLE table_name
- CREATE TABLE Customers (
- ID int NOT NULL,
- Name varchar(80) NOT NULL,
- PRIMARY KEY (ID)
- );
- ID int NOT NULL AUTO_INCREMENT
- SELECT * FROM Customers
- SELECT Name FROM Customers WHERE EXISTS
- (SELECT Item FROM Orders
- WHERE Customers.ID = Orders.ID AND Price < 50)
- INSERT INTO Yearly_Orders
- SELECT * FROM Orders
- WHERE Date <=1/1/2018
- SELECT Item, Price *
- (QtyInStock + IFNULL(QtyOnOrder, 0))
- FROM Orders
- SELECT SUBSTRING_INDEX( "www.bytescout.com" , " . " , 2);
- < a href ="https://bytescout.com" >www.bytescout.com</ a >
- SELECT COALESCE (NULL,NULL,'ByteScout' ,NULL,'Byte' )
- SELECT CONVERT (int , 27.64)
- SELECT eno,
- dno,
- salary,
- DENSE_RANK() OVER (PARTITION BY dno ORDER BY salary) AS ranking
- FROM employee;
ENO
|
DNO
|
SALARY
|
RANKING
|
7933
|
10
|
1500
|
1
|
7788
|
10
|
2650
|
2
|
7831
|
10
|
6000
|
3
|
7362
|
20
|
900
|
1
|
7870
|
20
|
1200
|
2
|
7564
|
20
|
2575
|
3
|
7784
|
20
|
4000
|
4
|
7903
|
20
|
4000
|
4
|
7901
|
30
|
550
|
1
|
7655
|
30
|
1450
|
2
|
7522
|
30
|
1450
|
2
|
7844
|
30
|
1700
|
3
|
7493
|
30
|
1500
|
4
|
7698
|
30
|
2850
|
5
|
EO
|
DNO
|
SALARY
|
AVG
|
SAL
|
7364
|
20
|
900
|
2173
|
21428
|
7494
|
30
|
1700
|
2173
|
21428
|
7522
|
30
|
1350
|
1350
|
21428
|
7567
|
20
|
3075
|
2173
|
21428
|
7652
|
30
|
1350
|
2173
|
21428
|
7699
|
30
|
2950
|
2173
|
21428
|
7783
|
10
|
2550
|
2173
|
21428
|
7789
|
20
|
3100
|
2173
|
21428
|
7838
|
10
|
5100
|
2173
|
21428
|
7845
|
30
|
1600
|
2173
|
21428
|
7877
|
20
|
1200
|
2173
|
21428
|
7901
|
30
|
1050
|
2173
|
21428
|
7903
|
20
|
3100
|
2173
|
21428
|
7935
|
10
|
1400
|
2173
|
21428
|
DTNO
|
ENO
|
ENAME
|
JOB
|
SAL
|
PREV
|
10
|
7931
|
STEVE
|
CLERK
|
1300
|
0
|
10
|
7783
|
JOHN
|
MANAGER
|
2450
|
1300
|
10
|
7834
|
KING
|
PRESIDE
|
5000
|
2450
|
20
|
7364
|
ROBIN
|
CLERK
|
800
|
1)
|
20
|
7876
|
BRIAN
|
CLERK
|
1100
|
800
|
20
|
7567
|
SHANE
|
MANAGER
|
2975
|
1100
|
20
|
7784
|
SCOTT
|
ANALYST
|
3000
|
2975
|
20
|
7908
|
KANE
|
ANALYST
|
3000
|
3000
|
30
|
7900
|
JAMES
|
CLERK
|
950
|
0
|
30
|
7651
|
CONNER
|
SALESMAN
|
1250
|
950
|
30
|
7522
|
MATTHEW
|
SALESMAN
|
1250
|
1250
|
30
|
7843
|
VIVIAN
|
SALESMAN
|
1500
|
1250
|
30
|
7494
|
ALLEN
|
SALESMAN
|
1600
|
1500
|
30
|
7695
|
GLEN
|
MANAGER
|
2850
|
1600
|
- SELECT eno,
- empname,
- job,
- salary,
- LEAD(salary, 1, 0) OVER (ORDER BY salary) AS salary_next,
- LEAD(salary, 1, 0) OVER (ORDER BY salary) - salary AS salary_diff
- FROM employee;
ENO
|
EMPNAME
|
DTNO
|
SALARY MIN
|
RESULT
|
7782
|
CLARK
|
10
|
2450
|
1300
|
7839
|
KING
|
10
|
5000
|
1300
|
7934
|
MILLER
|
10
|
1300
|
1300
|
7566
|
JONES
|
20
|
2975
|
800
|
7902
|
FORD
|
20
|
3000
|
800
|
7876
|
ADAMS
|
20
|
1100
|
800
|
7369
|
SMITH
|
20
|
800
|
800
|
7788
|
SCOTT
|
20
|
3000
|
800
|
7521
|
WARD
|
30
|
1250
|
950
|
7844
|
TURNER
|
30
|
1500
|
950
|
7499
|
ALLEN
|
30
|
1600
|
950
|
7900
|
JAMES
|
30
|
950
|
950
|
7698
|
BLAKE
|
30
|
2850
|
950
|
7654
|
MARTIN
|
30
|
1250
|
95
0
|
- empname,
- dtno,
- salary,
- MAX(salary) OVER () AS max_result
- FROM employee;
Interview Questions and Answers
Table of Contents
Shell Scripts
1. Find Second Largest + Second Smallest and Add Them
#!/bin/bash
arr=(5 1 9 6 1 2 8)
sorted=($(printf "%s\n" "${arr[@]}" | sort -n | uniq))
second_smallest=${sorted[1]}
second_largest=${sorted[-2]}
sum=$((second_smallest + second_largest))
echo "Second Smallest: $second_smallest"
echo "Second Largest: $second_largest"
echo "Sum: $sum"
2. Find Unique Characters and Longest Substring Without Repetition
#!/bin/bash
input="abcabcbb"
longest=""
temp=""
for (( i=0; i<${#input}; i++ )); do
char="${input:i:1}"
if [[ "$temp" == *"$char"* ]]; then
temp="${temp#*$char}${char}"
else
temp="$temp$char"
fi
if [ ${#temp} -gt ${#longest} ]; then
longest="$temp"
fi
done
echo "Longest substring without repetition: $longest"
3. Reverse a String with Tests
reverseStr() {
local str="$1"
echo "$str" | rev
}
doTestsPass() {
local result=0
if [[ "$(reverseStr 'abcd')" == "dcba" ]]; then result=$((result + 1)); fi
if [[ "$(reverseStr 'odd abcde')" == "edcba ddo" ]]; then result=$((result + 1)); fi
if [[ "$(reverseStr 'even abcde')" == "edcba neve" ]]; then result=$((result + 1)); fi
if [[ "$(reverseStr "$(reverseStr 'no change')")" == "no change" ]]; then result=$((result + 1)); fi
if [[ "$(reverseStr '')" == "" ]]; then result=$((result + 1)); fi
if [[ $result -eq 5 ]]; then
echo "All tests pass"
else
echo "There are test failures"
fi
}
doTestsPass
4. Print Line 50 to 60
sed -n '50,60p' filename.txt
5. Search Pattern and Count Occurrence
grep -o "pattern" filename.txt | wc -l
6. Find Dot Product of Two Arrays
#!/bin/bash
a=(1 2 3)
b=(4 5 6)
dot_product=0
for i in "${!a[@]}"; do
dot_product=$((dot_product + a[i]*b[i]))
done
echo "Dot Product: $dot_product"
7. Check and Print Integers from a String
#!/bin/bash
str="abc123def45gh6"
echo "$str" | grep -o '[0-9]\+'
8. Find 3rd Least Array Value
#!/bin/bash
arr=(5 2 8 1 7 9)
sorted=($(printf "%s\n" "${arr[@]}" | sort -n | uniq))
echo "Third Least: ${sorted[2]}"
9. Check Process Status
#!/bin/bash
process="sshd"
if pgrep "$process" > /dev/null
then
echo "$process is running"
else
echo "$process is not running"
fi
10. Delete Blank Lines
sed -i '/^$/d' filename.txt
11. Lines Greater Than 5 Characters (awk)
awk 'length($0) > 5' filename.txt
12. Print Last 10 Lines
tail -n 10 filename.txt
13. Check Palindrome
#!/bin/bash
read -p "Enter a string: " str
rev_str=$(echo "$str" | rev)
if [[ "$str" == "$rev_str" ]]; then
echo "Palindrome"
else
echo "Not Palindrome"
fi
14. Check File Exists, Size, Send Email
#!/bin/bash
file="yourfile.txt"
if [[ -f "$file" ]]; then
size=$(stat -c%s "$file")
echo "File size: $size bytes"
if (( size > 0 )); then
echo "File exists and is non-empty" | mail -s "File Status" your@email.com
else
echo "File is empty" | mail -s "File Empty Alert" your@email.com
fi
else
echo "File not found" | mail -s "File Not Found" your@email.com
fi
AutoSys & Job Scheduling
On Hold vs On Ice
- On Hold: Job is manually stopped. Needs manual release.
- On Ice: Job is skipped this time but will run in next schedule automatically.
Predecessor and Successor
- Predecessor: Job that must complete first.
- Successor: Job that runs after predecessor finishes.
Job Schedule Example (Cron)
0 10 * * 1-5 /path/to/your/script.sh
# Runs Monday to Friday at 10 AM
Migration Testing Points
- Validate job dependencies.
- Check environment variables.
- Check file watchers, time triggers.
- Validate calendars, schedules.
- Test job logs and outputs.
- Test email alerts.
Java Programming
Kill Process Tree
import java.util.*;
public class KillProcess {
public static void main(String[] args) {
int[] pid = {1, 3, 10, 5};
int[] ppid = {3, 0, 5, 3};
int kill = 5;
List result = new ArrayList<>();
Map> map = new HashMap<>();
for (int i = 0; i < ppid.length; i++) {
map.computeIfAbsent(ppid[i], k -> new ArrayList<>()).add(pid[i]);
}
Queue queue = new LinkedList<>();
queue.add(kill);
while (!queue.isEmpty()) {
int curr = queue.poll();
result.add(curr);
if (map.containsKey(curr)) {
queue.addAll(map.get(curr));
}
}
System.out.println(result);
}
}
Find Odd and Even Numbers
int[] arr = {2,3,4,5,6,3,7};
for(int i=0; i
Character Frequency Count
String str = "abcdascgab";
Map map = new HashMap<>();
for(char c : str.toCharArray()) {
map.put(c, map.getOrDefault(c,0)+1);
}
map.forEach((k,v) -> System.out.println(k + " -> " + v));
Longest Occurrence of Character
public class LongestChar {
public static void main(String[] args) {
String s = "aaabbccccdde";
int maxLen = 0, currLen = 1, start = 0, maxStart = 0;
for (int i = 1; i < s.length(); i++) {
if (s.charAt(i) == s.charAt(i-1)) {
currLen++;
} else {
if (currLen > maxLen) {
maxLen = currLen;
maxStart = i - currLen;
}
currLen = 1;
}
}
if (currLen > maxLen) {
maxLen = currLen;
maxStart = s.length() - currLen;
}
System.out.println("Character: " + s.charAt(maxStart));
System.out.println("Starting Index: " + maxStart);
System.out.println("Length: " + maxLen);
}
}
Python Programs
Find Missing Characters to Form Pangram
import string
def missing_chars(input_str):
input_str = input_str.lower()
missing = [ch for ch in string.ascii_lowercase if ch not in input_str]
return ''.join(missing)
print(missing_chars("the quick brown fox jumps over the dog"))
First Non-Repeating Character
def first_unique_char(s):
for c in s:
if s.count(c) == 1:
return c
return None
print(first_unique_char("apple"))
Unix, Git, CI/CD, SDLC Knowledge
Unix Commands
- cd: change directory
- pwd: print working directory
- ls: list files
- mkdir: create directory
- rm: remove files
- sed: text manipulation
- awk: pattern scanning and processing
CI/CD Pipeline Flow
- Code Checkout ➔ Build ➔ Test ➔ Deploy ➔ Monitor
- Tools: Git, Jenkins, Nexus, SonarQube
SDLC Phases
- Planning
- Analysis
- Design
- Development
- Testing
- Deployment
- Maintenance
Git Rebase
Move branch commits to a new base, to make history linear.
Merge Conflict Resolution
- Pull changes.
- Edit conflicts manually.
- Mark resolved.
- Commit.
ALTER Keyword usage
Use | Syntax | Example |
---|---|---|
Rename table | ALTER TABLE old_table_name RENAME TO new_table_name; | ALTER TABLE employees RENAME TO staff; |
Add column | ALTER TABLE table_name ADD column_name datatype; | ALTER TABLE employees ADD salary DECIMAL(10,2); |
Drop column | ALTER TABLE table_name DROP COLUMN column_name; | ALTER TABLE employees DROP COLUMN salary; |
Rename column | ALTER TABLE table_name RENAME COLUMN old_column_name TO new_column_name; | ALTER TABLE employees RENAME COLUMN name TO full_name; |
Modify datatype | ALTER TABLE table_name MODIFY column_name new_datatype; (MySQL) | ALTER TABLE employees MODIFY salary BIGINT; |
Add primary key | ALTER TABLE table_name ADD PRIMARY KEY (column_name); | ALTER TABLE employees ADD PRIMARY KEY (employee_id); |
Drop primary key | ALTER TABLE table_name DROP PRIMARY KEY; | ALTER TABLE employees DROP PRIMARY KEY; |
Add foreign key | ALTER TABLE child_table ADD CONSTRAINT fk_name FOREIGN KEY (child_column) REFERENCES parent_table(parent_column); | ALTER TABLE orders ADD CONSTRAINT fk_customer FOREIGN KEY (customer_id) REFERENCES customers(id); |
Drop foreign key | ALTER TABLE table_name DROP FOREIGN KEY fk_name; | ALTER TABLE orders DROP FOREIGN KEY fk_customer; |
SQL and Database Interview Questions and Answers
SQL and Database Interview Questions and Answers
1. What is normalization?
Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity.
2. Difference between TRUNCATE and DELETE
TRUNCATE removes all rows instantly without logging individual row deletions and cannot be rolled back (DDL). DELETE removes rows one by one and can be rolled back (DML).
3. Explain DDL and DML
DDL (Data Definition Language) defines schema (CREATE, ALTER). DML (Data Manipulation Language) manipulates data (SELECT, INSERT, UPDATE, DELETE).
4. What is an index?
An index improves the speed of data retrieval operations on a table.
5. Explain joins
Joins combine rows from two or more tables based on a related column. Types: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN.
6. Definition of SQL, DDL, JOIN
SQL is Structured Query Language. DDL defines schema structure (e.g., CREATE). JOIN combines rows from two tables.
7. What is a primary key and its advantages?
A primary key uniquely identifies each record in a table and ensures entity integrity, preventing NULLs and duplicates.
8. Write a query for employee-manager hierarchy
SELECT m.empname AS Manager, e.empname AS Employee
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id;
9. How to sum two values without using functions?
SELECT col1 + col2 AS total FROM table_name;
10. What are constraints in SQL? Why are they used?
Constraints enforce rules on data columns: NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, DEFAULT.
11. Fetch duplicate records from a table
SELECT column1, COUNT(*) FROM table GROUP BY column1 HAVING COUNT(*) > 1;
12. Write SQL to get the third highest salary
SELECT DISTINCT salary FROM employee ORDER BY salary DESC LIMIT 1 OFFSET 2;
13. Get manager name using self join
SELECT e.name AS Employee, m.name AS Manager
FROM employees e
JOIN employees m ON e.manager_id = m.emp_id;
14. What is a view?
A view is a virtual table based on a SQL query. It does not store data itself.
15. What is a materialized view? Difference from a normal view?
A materialized view stores the query result physically and is refreshed periodically, unlike a regular view which fetches data dynamically.
16. Can we update a view?
Yes, if the view is based on a single table without aggregations or GROUP BY.
17. Can we use DELETE without WHERE clause?
Yes, but it will delete all rows from the table. Use cautiously.
18. Reasons for SQL procedure delay?
Could include locks, missing indexes, large data volumes, suboptimal query plans, outdated statistics.
19. What are triggers?
Triggers are automatic actions fired in response to INSERT, UPDATE, or DELETE events on a table.
20. How to establish DB connection?
Use a connection string with hostname, port, database name, username, password, and driver details (JDBC/ODBC).
21. What is EXPLAIN PLAN?
EXPLAIN PLAN shows the execution path a SQL query will follow, helping to optimize performance.
22. Difference between DELETE, DROP, and TRUNCATE
DELETE removes specific rows, DROP removes the table itself, TRUNCATE removes all rows quickly without logging each removal.
23. Use of views?
Views simplify query complexity, enhance security, and help in modular database design.
24. Types of joins in Oracle?
Oracle supports INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, and old-style (+) joins.
25. Write query where empid = managerid
SELECT empname FROM employee WHERE empid = managerid;
26. What are JDBC parameters?
Include database URL, username, password, driver class, and optional settings like timeouts and SSL properties.
27. What is SQLException?
SQLException is thrown when there is an issue accessing the database, like syntax error, connection failure, constraint violation, etc.
Interview Questions and Answers
All Interview Questions and Answers
Shell Script Questions
1. Find Second Largest and Second Smallest and Add
arr=(12 13 3 4 1 6 9 17 13)
unique_arr=($(echo "${arr[@]}" | tr ' ' '\n' | sort -n | uniq))
second_smallest=${unique_arr[1]}
second_largest=${unique_arr[-2]}
sum=$((second_smallest + second_largest))
echo "Sum: $sum"
2. Reverse a String
reverseStr() {
local str="$1"
echo "$str" | rev
}
doTestsPass() {
local result=0
if [[ "$(reverseStr 'abcd')" == "dcba" ]]; then result=$((result + 1)); fi
if [[ "$(reverseStr 'odd abcde')" == "edcba ddo" ]]; then result=$((result + 1)); fi
if [[ "$(reverseStr 'even abcde')" == "edcba neve" ]]; then result=$((result + 1)); fi
if [[ "$(reverseStr "$(reverseStr 'no change')")" == "no change" ]]; then result=$((result + 1)); fi
if [[ "$(reverseStr '')" == "" ]]; then result=$((result + 1)); fi
if [[ $result -eq 5 ]]; then
echo "All tests pass"
else
echo "There are test failures"
fi
}
doTestsPass
3. Find Unique Characters and Print Longest Possible Substring
input="tisishatrecupttt"
declare -A freq
for ((i=0; i<${#input}; i++)); do
char="${input:$i:1}"
freq[$char]=$(( ${freq[$char]} + 1 ))
done
unique=""
for c in $(echo ${!freq[@]}); do
if [[ ${freq[$c]} -eq 1 ]]; then
unique+=$c
fi
done
echo "Unique characters substring: $unique"
4. Print Line 50 to 60 of a File
sed -n '50,60p' filename.txt
5. Search Pattern and Count Frequency
grep -o 'pattern' filename.txt | wc -l
6. Shell Script: Palindrome Check
str="$1"
rev_str=$(echo "$str" | rev)
if [[ "$str" == "$rev_str" ]]; then
echo "Palindrome"
else
echo "Not Palindrome"
fi
7. Check File Exists and Size and Send Email
if [[ -f "$file" ]]; then
if [[ $(stat -c%s "$file") -gt 0 ]]; then
echo "File exists and not empty" | mail -s "File Alert" user@example.com
fi
fi
8. Delete Blank Lines in a File
sed -i '/^$/d' filename.txt
9. AWK: Print Lines Greater Than Length 5
awk 'length($0) > 5' filename.txt
10. Print Last 10 Lines
tail -n 10 filename.txt
AutoSys Related Questions
On Hold vs On Ice
- On Hold: Job will not run until manually released. Successor jobs do not run.
- On Ice: Job will not run but successors can run as if it succeeded.
JIL File
Job Information Language (JIL) is used to define AutoSys jobs with attributes such as command, machine, owner, start times, etc.
Predecessor/Successor Concept
Predecessor: Job that must finish before another can start.
Successor: Job that depends on the predecessor.
Job Schedule
Defined using "start_times", "days_of_week", "run_calendar" attributes in AutoSys.
Java and Python Coding
1. Find Odd and Even Numbers
int[] arr = {2, 3, 4, 5, 6, 3, 7};
for (int num : arr) {
if (num % 2 == 0) System.out.println(num + " Even");
else System.out.println(num + " Odd");
}
2. Kill Process Problem
Map<Integer, List<Integer>> tree = new HashMap<>();
for (int i = 0; i < ppid.length; i++) {
tree.computeIfAbsent(ppid[i], k -> new ArrayList<>()).add(pid[i]);
}
Queue<Integer> queue = new LinkedList<>();
queue.add(kill);
while (!queue.isEmpty()) {
int current = queue.poll();
result.add(current);
if (tree.containsKey(current)) {
queue.addAll(tree.get(current));
}
}
3. Find Frequency of Characters
String s = "abcdascgab";
Map<Character, Integer> freq = new HashMap<>();
for (char c : s.toCharArray()) {
freq.put(c, freq.getOrDefault(c, 0) + 1);
}
4. Remove Lines with Null 6th Field
BufferedReader reader = new BufferedReader(new FileReader("input.txt"));
BufferedWriter writer = new BufferedWriter(new FileWriter("output.txt"));
String line;
while ((line = reader.readLine()) != null) {
String[] fields = line.split(",");
if (fields.length >= 6 && fields[5] != null && !fields[5].isEmpty()) {
writer.write(line);
writer.newLine();
}
}
reader.close();
writer.close();
Conceptual Questions
Git
- Rebase: Reapply commits on top of another branch.
- Merge Conflict: Happens when two branches changed same line; resolve manually and commit.
Unix
- AWK: Text processing and extraction tool.
- Extract 20th line:
sed -n '20p' filename.txt
SDLC Process
Phases: Planning, Analysis, Design, Development, Testing, Deployment, Maintenance.
CI/CD Pipeline
Build - Test - Deploy automatically on code changes using tools like Jenkins, GitHub Actions, etc.
Logical Questions
1. Dot Product of Two Arrays
int[] a = {1, 2, 3};
int[] b = {4, 5, 6};
int result = 0;
for (int i = 0; i < a.length; i++) result += a[i] * b[i];
2. Print Integers from String
String input = "a1b2c3";
for (char c : input.toCharArray()) {
if (Character.isDigit(c)) System.out.print(c + " ");
}
Autosys Job Status Cheatsheet
AutoSys Job Status Cheat Sheet
What happens if a Box is on HOLD or ICE?
Action | Meaning | Effect on Box | Effect on Jobs inside |
---|---|---|---|
ON HOLD | Manual pause | Box stays READY, but jobs inside do NOT run | Child jobs are NOT allowed to start |
ON ICE | Freeze | Box is frozen, won't even evaluate jobs inside | Child jobs are NOT evaluated or started |
Box ON HOLD
The box remains active but child jobs do not run until the hold is released.
Box ON ICE
The box and child jobs are frozen. No evaluation happens. After removing ICE, only future events are considered.
AutoSys Job Statuses Explained
Status | Meaning | Description |
---|---|---|
INACTIVE | Idle | Job created, waiting for time/condition to trigger. |
ACTIVATED | Ready | Time/dependency met, waiting to run. |
STARTING | Launching | Starting process initiated but not running yet. |
RUNNING | Executing | Job is currently executing. |
SUCCESS | Completed | Job finished successfully. |
FAILURE | Failed | Job failed due to error. |
TERMINATED | Killed | Manually killed job. |
ON_HOLD | Paused | Manually put on hold; will not run. |
ON_ICE | Frozen | Completely frozen; not even evaluated. |
QUE_WAIT | Waiting for Queue | Waiting for machine queue availability. |
WAIT_REPLY | Wait for machine | Waiting for machine/agent reply. |
WAIT_START_TIME | Scheduled Wait | Waiting for start time. |
RESTART | Retry | Retrying job after failure. |
EVENT_WAIT | Waiting Event | Waiting for an external trigger/event. |
UNKNOWN | Lost | Agent disconnected or unreachable. |
How to Change Status (Common Commands)
sendevent -E HOLD_JOB -J jobname
– Put a job ON HOLDsendevent -E RELEASE_JOB -J jobname
– Release HOLD from jobsendevent -E JOB_ON_ICE -J jobname
– Put a job ON ICEsendevent -E JOB_OFF_ICE -J jobname
– Remove ICE from jobsendevent -E FORCE_STARTJOB -J jobname
– Force Start Jobsendevent -E KILLJOB -J jobname
– Kill Job
If You See This, Then...
If you see... | It means... |
---|---|
INACTIVE | Job is waiting for schedule. |
ACTIVATED | Job ready to start. |
RUNNING | Job is executing. |
SUCCESS | Job completed successfully. |
FAILURE | Job failed, check logs. |
ON_HOLD | Needs manual release. |
ON_ICE | Frozen, manual intervention required. |
UNKNOWN | Lost communication, investigate machine/agent. |
Summary
HOLD = Pause but still aware of time/dependency.
ICE = Fully frozen, ignores schedules until manually released.
SUCCESS/FAILURE = Used to decide next steps in job chains.
PySpark Interview Preparation Guide
PySpark Interview Preparation Guide
Day 1: PySpark Basics & Core Concepts
- What is PySpark: Python API for Apache Spark used for large-scale data processing.
- Spark Architecture: Consists of Driver, Executors, Cluster Manager.
- RDD vs DataFrame vs Dataset: RDD is low-level, DataFrame is optimized and user-friendly.
- Transformations vs Actions: Transformations are lazy; Actions trigger computation.
- Lazy Evaluation: Optimization mechanism that delays execution until necessary.
Day 2: RDD Operations & DataFrame API
- RDD operations: map, flatMap, filter, reduceByKey.
- DataFrame creation: from RDD or structured data.
- DataFrame methods: select, filter, groupBy, agg, withColumn, drop, cast, alias.
- File formats: CSV, JSON, Parquet reading and writing.
Day 3: Joins, UDFs & SQL in PySpark
- Join types: inner, left, right, outer joins.
- SQL queries: Registering temp views and running SQL on DataFrames.
- UDFs: Create custom transformation logic with User Defined Functions.
Day 4: Window Functions & Complex Operations
- Window Functions: row_number, rank, dense_rank, lead, lag.
- Partitioning: Use of partitionBy and orderBy in window specs.
- Pivot: Reshape DataFrame using pivot/unpivot operations.
Day 5: Performance Tuning & Optimization
- Catalyst Optimizer: Optimizes query plans in Spark SQL.
- Tungsten Engine: Handles memory and binary code optimization.
- Partitioning: Efficient data distribution using repartition and coalesce.
- Caching & Persistence: Store intermediate results in memory or disk.
- Broadcast Join: Used when one dataset is small enough to fit in memory.
Day 6: PySpark with Machine Learning (MLlib)
- MLlib: Spark's machine learning library.
- Pipeline: Chain of Transformers and Estimators.
- VectorAssembler: Combine features into a single vector column.
- StandardScaler: Normalize features.
- Models: LinearRegression, LogisticRegression.
Day 7: Real-time Scenarios + Mock Interview
- Real-time Use Cases: Handling ETL, ingestion pipelines, and optimizations.
- Performance Bottlenecks: Identifying and resolving slow Spark jobs.
- Common Issues: Data skew, large joins, memory pressure.
- Mock Questions: End-to-end project explanation, tuning strategies, troubleshooting steps.
Use this guide to prepare thoroughly for PySpark interviews from basic to advanced levels. Each day is structured for progressive learning and hands-on practice.
PySpark Interview Question and Answers
PySpark Questions and Answers
Question | Answer | Explanation |
---|---|---|
Which method below is used to create a temporary view on DataFrame? | DataFrame.createOrReplaceTempView("View Name") | This method registers a DataFrame as a temporary table for SQL queries in Spark. |
Which of the below Spark Core operations are wide transformations and result in data shuffling? | groupBy | groupBy triggers data shuffling across the network, which makes it a wide transformation. |
Which of the below option is correct to persist RDD only in primary memory? | rdd.persist(StorageLevel.MEMORY_ONLY) | This persists the RDD in memory only; if memory is not sufficient, it will recompute when needed. |
Which method can be used to verify the number of partitions in RDD? | RDD.getNumPartitions() | getNumPartitions() returns the number of partitions in an RDD. |
Which code snippet correctly converts dataset to DataFrame using namedtuple? | transDF = sc.textFile(...).map(...).map(lambda c: Cust_Trans(c[0], c[1], c[2], int(c[3]))).toDF() | This code splits the line, maps it to a namedtuple, and converts it to a DataFrame. |
Which of the below method(s) is/are Spark Core action operations? | collect(), foreach(), reduce() | These are Spark actions that trigger computation and return results. |
Which method is used to read a JSON file as a DataFrame? | sparkSessionObj.read.json("Json file path") | This is the standard way to load a JSON file using SparkSession. |
Which method is used to increase RDD partitions for better parallelism? | RDD.repartition(Number of partitions) | repartition increases partitions and involves shuffling for balanced distribution. |
Which transformation aggregates values by key efficiently in paired RDD? | reduceByKey() | reduceByKey performs local aggregation before shuffling, making it efficient. |
Which method is used to save a DataFrame as a Parquet file in HDFS? | DataFrame.write.parquet("File path") | This method saves a DataFrame in Parquet format to the given path. |
Which object acts as a unified entry point for Spark SQL including Hive? | SparkSession | SparkSession is the main entry point for DataFrame and SQL functionality. |
Which transformation can only be applied on paired RDDs? | mapValues() | mapValues transforms only values, keeping keys unchanged; used only on key-value RDDs. |
Which method is used to save DataFrame to a Hive table? | DataFrame.write.option("hivepath", "/path").saveAsTable("Banking.CreditCardData") | This saves the DataFrame as a Hive table with additional options. |
Which of the following is a Spark Action? | collect(), first, take | These actions trigger execution and return results from the RDD/DataFrame. |
How to extract the first and third column from an RDD? | data1.map(lambda col: (col[0], col[2])) | Accessing elements by index allows extraction of specific columns from an RDD. |
What is the output of foldByKey with add on [('a',1), ('b',2), ('a',3), ('a',4)]? | [('a', 8), ('b', 2)] | foldByKey with 0 as initial value sums values grouped by key using add. |
What is the output of countByValue() on RDD with [(11,1), (1,), (11,1)]? | [((11, 1), 2), ((1,), 1)] | countByValue counts how many times each element occurs in the RDD. |
Which method displays contents of DataFrame as a collection of Row? | data1.collect() | collect() returns the content as a list of Row objects. |
Which object is created by the system in Spark interactive mode? | SparkSession | SparkSession is automatically created in interactive mode for convenience. |
What is the difference between persist() and cache() in Spark? | cache() is equivalent to persist(StorageLevel.MEMORY_AND_DISK) | cache() is a shorthand for persist with default storage level MEMORY_AND_DISK. persist allows custom storage levels. |
How does Spark handle data shuffling, and why is it expensive? | Spark redistributes data across partitions, causing I/O, network, and memory overhead. | Shuffling involves disk and network operations which slow down performance and require more resources. |
What is a broadcast variable in Spark and when should it be used? | Used to cache a read-only variable on all nodes to avoid shipping with tasks. | Broadcast variables are efficient for small datasets that are reused across many tasks. |
Explain the difference between narrow and wide transformations with examples. | Narrow (e.g., map): data from one partition. Wide (e.g., groupByKey): requires shuffle. | Narrow transformations don't require shuffling. Wide ones do and are more expensive. |
What are the different storage levels in Spark? | MEMORY_ONLY, MEMORY_AND_DISK, DISK_ONLY, MEMORY_ONLY_SER, etc. | Storage levels define how RDDs are cached – in memory, disk, or serialized form. |
What is a DAG in Spark, and how is it used in job execution? | DAG is a Directed Acyclic Graph of stages representing computation lineage. | Spark builds a DAG of execution for transformations before running any action. |
What are accumulators in Spark and how are they different from broadcast variables? | Accumulators are write-only shared variables for aggregations; broadcast are read-only. | Accumulators are useful for debugging or counters; broadcast for small lookup data. |
How do DataFrame APIs differ from RDD APIs in Spark? | DataFrames are optimized using Catalyst and Tungsten; RDDs offer more control. | DataFrames are higher-level APIs with better performance; RDDs are more flexible but slower. |
What are some best practices for optimizing Spark jobs? | Use partitioning, caching, avoid shuffles, use broadcast joins, and monitor jobs. | Performance improves by reducing shuffles, tuning partitions, and reusing cached data. |
Explain checkpointing and why it is used in Spark streaming applications. | Checkpointing saves RDD lineage info to stable storage to recover from failures. | It helps prevent long lineage chains and supports recovery in streaming jobs. |