Algorithm problems - prep set 1

From NoskeWiki
Jump to navigation Jump to search


NOTE: This page is a daughter page of: Algorithm problems

Prep set 1

Below is a nice set of question to prepare for an interview.

QUESTION 1: Given two binary trees write a function to see if both are identical (MED)

This question is asking us if the trees are identical in both values AND structure. Often people say "binary trees" to mean "binary search trees", but here they are not search trees (and even if they were it wouldn't change the solution). The solution is to do a depth first search on both trees at once and return false the first time you find a node that doesn't match the other either because (a) they have different values, or (b) one is null, but the other is not. If you get to the end without finding such a case you can return true. Here's a recursive solution:

bool areTreesIdentical(Node *a, Node *b)
{                                             // Base cases:
  if(a==NULL && b==NULL)                        // If both children are empty: this part of tree is identical.
    return true;
  if(a==NULL || b==NULL)                        // If only one child is empty: not identical.
    return false;
  if(a->value != b->value)                      // If different values: not identical.
    return true;
                                              // Recursive case: (we have more children)
  if(!areTreesIdentical(a->left,b->left))       // Search left nodes/sub-tree and if there are not identical
    return false;                               // don't bother searching the right.
  return areTreesIdentical(a->right,b->right);  // Search and return whether the right/sub-tree nodes are identical.
  • Big O analysis: time = O(n), space = O(log n) (to build our stack)

QUESTION 2: One integer in an array occurs an odd number of times, find which one (EASY)

The wording of this question is a little confusing; the full description: You are given an array of integers. All but one of these integers appear an even number of times. Write a function to determine the single integer appears an even number of times.

You can start by visualizing an example: {1,4,2,1,3,3,4,2,4,2,4} - in which case we'd want to return 2 (it occurs three times). There are a few different solutions to this. One method would be to create a hash table to tally the frequency of each unique number:

#include <vector>
#include <hash_map>
using namespace std;

int findOddOccuringInt(vector<int> &v)
  if(v.size()==0)  return (-1);          // Error checking: let's just pretend -1 represents an error.
  hash_map<int,int> freq;
  for(int i=0; i<v.size();i++)
    map[ v[i] ] += 1;                    // Increment the frequency associated with this integer.
  for(hash_map<int,int>::iterator it=map.begin(); it!=map.end(); it++)  {   // For each item in our hash map:
    if((*it)->second % 2 ==1)            // If value occurred an odd number of times:
      return (*it)->first;               // return key.
  return (-2);                           // Let's just pretend -2 signals that there is no odd occurring element.
  • Big O analysis: time = O(n) (should have pretty few collisions), space = O(n) (worst case, as may only have a few unique values)

An easier solution is if you simply sort the array using an O(n log n) algorithm like quick-sort. Suddenly our array becomes: {1,1,2,2,2,3,3,4,4,4,4}, and our code looks like this:

#include <vector>
#include <algorithm>      // for "sort"
using namespace std;

bool findOddOccuringInt(vector<int> v, int *oddOneOut)
  if(v==NULL || v.size()==0)
    return (false);
  v = sort(v.begin(), v.end());     // Uses the < operator and is average O(n log n).
  int currInt = v[0];
  int freq = 1;
  for(int i=1; i<v.size();i++)
    if(v[i] == currInt) {
    else if(freq % 2 == 1) {
       *oddOneOut = currInt;
       return (true);
    else {
      currInt = v[i];
      freq = 1;
  return (false);
  • Big O analysis: time = O(n log n), space = O(n) (if mutable or if we want to preserve order of original array, we have to create a copy)

The most efficient solution, however, is one you'll probably not guess: a bitwise XOR on every element. If you imagine the ordered array, you can probably image how an integer flips back to 0 when XOR'd by the same number a second time (or on any even occurrence). Even if the array is out of order, it turns out we'd still be left with the value of the odd number.

int findOddOccuringInt(vector<int> v)
  if(v==NULL || v.size()==0)
    return (-1);
  int xorSum = 0;
  for(int i=0; i<v.size();i++)
    xorSum = xorSum ^ v[i];
  return (-2);
  • Big O analysis: time = O(n), space = O(1)

QUESTION 3: Given an array of appointments (start and end times) return an minimal array showing busy periods

The full question here is: Write a function which inputs an array of appointments (where each appointment is a struct with two integers representing start and end time) then show me what periods of times that person is busy (without needing to see every appointment). The example I was shown was an input of: { (1-2), (2-3), (4-5) }, in which case the desired output would be: { (1-3), (4-5) }

When faced with this question I made two incorrect assumptions: (1) I assumed the array was ordered, and (2) I assumed times couldn't overlap. This was influenced a lot by the example shown to me, but the lesson is to never make assumptions. This question turns out to be a test of "edge cases" - your input isn't necessarily in order, and a person may have multiple overlapping bookings: { (8-11), (9-10), (2-6), (0-4), (0-1), (1-5) }.

First we'd need to order, but how would we order is a bit tricky. If we overload the "less than" operator we ordered by start and then end we get:

TIMES:   0  1  2  3  4  5  6  7  8  9  10 11
(0-1)    +--+
(0-4)    +-----------+
(1-5)       +-----+  .
(2-6)          +-----------+
(8-11)                     .     +--------+
(9-10)                              +--+  .

One thing we could do is then merge any elements which overlap, but to save the trouble of merging we can just keep track of the start and maximum end time (represented by the dots), and add a new busy time only when we see the next start time occurs after the current maximum end time.

class Appt
  int start;
  int end;
  void Appt(int s, int e) { start = s; end = e; }
  bool operator<(const Appt &rhs) const
    if(start != rhs.start)  return start < rhs.start;      // we want to order by start time
    else if(end != rhs.end) return end < rhs.end;          // then end time (NOTE: actually this line isn't critical for success)
    else                      return false;                // and return false if equal

vector<Appt> findBusyTimes(vector<Appt> &appt)
  vectory<Appt> busyTimes;                   // To record our busy times.
  if(appt.size()==0)  return busyTimes;      // If no appointments: return empty vector.
  sort(appt.begin(), appt.end());            // Sort the array using our custom < operator
                                             // ("sort" comes from the from the <algorithms> library).
  int start = appt[0].start;                 // Keep track of current "start time".
  int maxend = appt[0].end;                  // Keep track of current maximum end time.
  for(int i=1; i<v.size(); i++)              // For each appointment:
    if(v[i].start > maxend) {                              // If new start time is greater than our maximum end time covered
      busyTimes.push_back(new Appt (start, maxendt));      // add a new busy time with current start/maxend values.
      start = v[i].start;                                  // |-- Then start looking for next busy time.
      maxend = v[i].end;                                   // |
    else if (v[i].end > maxend) {                          // If the end time is greater than our current end time 
      maxend = v[i].end;                                   // increase the end of our current busy time.
  busyTimes.push_back(new Appt(start, maxendt));      // Add another busy time.... even if we have just 1 appointment
  return (busyTimes);                                 // we'll have at least one busy time to return.
  • Big O analysis: time = O(n log n), space = O(1)

QUESTION 4: Why would you use XML for a messenger applications to send a structure of ints between client and server machines rather than raw memory chunks (MED)

An alternative wording of this question: What problems may exist with a server (eg: messenger app) sending data to the client (say via TCP/IP) as the chunk of memory representing that structure (say a struct with three ints) - versus a system like XML?

Here it may help to write out an example of what would get sent:

struct Message {
  int a;
  int b;
  int c;

The three potential problems here are:

  1. Endianness / byte-ordering - one machine (eg: A Windows box, or MAC OS on x86) might be little endian with the least significant byte (LSB) first, while the other machine (eg: Mac OS with PowerPCs) may be big endian with the bytes in the opposite order with the most significant byte (MSB) first... and this difference in endianness would change our value when received.
  2. Different lengths - depending on the machine, programming language and compiler, different machines may have different length integers, meaning one machine might expect 32 bit long integers instead of 64 bit integers.
  3. No versioning - it's likely the messenger app may evolve and add extra elements (like another int) to the structure. XML tends to report version number and helps allow compatibilty between versions, but no so in this Serialized form.

Here the TCP/IP can ensure the message is not corrupted during sending, but over the Internet we are talking about a heterogeneous network, so its likely the server and clients can be all different architectures - different operation systems (some 32-bit, some 64-bit) and different endianness too. Another problem might be pointers, which are hard to del with and require "pointer swizzling" (converting pointers to id numbers), but we're not asked about that. Note that the "Serializable" class in Java and C# deals with "unswizzling" and architecture differences (like endianness), but in this question were asked about just the raw data. Raw data may lead to a smaller dataset, but XML is preferred because it tends to take care of differences in architecture, versioning and version compatibility (when implemented correctly).

QUESTION 5: Ten network connected machines each generate a billion random integers. Determine the medium value of all integers, without copying billions of values between machines.

The medium value is the value which occurs in the middle of an ordered sequence. In {1,2,5,6,7}, 5 is the median and for {1,5,6,7} the median value is 5.5 (if there's an even number of elements you average the two in the middle). Given an array, the median value can be found easily by ordering the elements with an O(n log n) algorithm, then picking the middle most ones, but here we're not allowed to copy all these values across. Create a simple example where, instead of 10 machines you have 2, and each contains 5 numbers instead of a billion.

Machine A:   4, 1, 5, 3, 2
Machine B:   5, 9, 9, 8, 1  

And let's sort this to see the answer easier:

Machine A:   1, 2, 3, 4, 5      (medium here = 3)           before sorting= 4, 1, 5, 3, 2
Machine B:   1, 5, 8, 9, 9      (medium here = 8)           before sorting= 5, 9, 9, 8, 1

Here the medium of each subset doesn't really help us establish that medium of all numbers (1,1,2,3,4,5,5,8,9,9) is 4.5 - but what we can see is that the number 4.5 has a total of 5 values above it, and 5 values below it. Instead of copying millions of values, we could thus have a single machine "guess" a value, and all the other machines (itself included) say how many values are above and below, so that the head machine can sum this as the total # less and total # greater and then can guess higher or lower each time. Each time we ask this question on an unsorted list we pay O(n).... and since we're unlikely to get it first time maybe we should first find the "total range of values" by finding the lowest and highest value over all machines. If r=our range of integers (and consider that a four byte integer has ~4 billion possible values) and guessed in the middle each time we'd have to pay O(n log r). If we sort the array firsts we pay an initial O(n log n), so that leads to O(n log n + log n log r).

A better guessing solution instead might implement a modified quicksort sorting algorithm. In the quicksort, one of the values in our array is randomly chosen as a "pivot" value, then all smaller values are moved towards the front (not necessarily in ascending order though), all greater values moved to the end, and then "pivot" itself put is put in the middle producing something that looks like this:

[elements this side are <= pivot]   [*pivot*]  [elements this side are >= pivot]

For example: if "2" is our random pivot in the array{4, 1, 5, 3, 2} we'd get:   1,*2*,4,5,2

In normal quicksort this gets we repeat this process recursively on both sides.... but in this case if we find there are more values greater than our pivot (over all machines) then we can ignore all values on the left of our first pivot (we know how many values are in this direction and that one of them can be the medium) - and instead pick one of the random number on the right side (a number which will be >= than our current pivot) as the next pivot. By tracking the pivots, we wouldn't have to sort the arrays - and probably we'd shift n then n/2, then n/4... and since 1+1/2+1/4 etc ends up as 2 our actual time complexity for the entire algorithm would be O(n)! Keep in mind however, that this is average case, in worse case we somehow pick a 1 then 2 and all the way up to r/2 - and this would be O(n^2).

QUESTION 6: What is a virtual method

The "virtual" keyword exists only in C++ - in C# or Java all methods (and destructors) are virtual, so you'll need to know about this concept. A virtual method is a method which gets "decided on" during run time, depending on the object type making the call. An example without the virtual keyword:

class BaseClass                 { public:  void print() { cout << "Base." };     }
class DerivedClass : BaseClass  { public:  void print() { cout << "Derived." };  }
void main() {
  BaseClass *ptr = new DerivedClass();
  ptr->print();       // will print "Base"

Here we probably want to declare print as virtual:

class BaseClass                 { public:  virtual void print() { cout << "Base." };     }
class DerivedClass : BaseClass  { public:  virtual void print() { cout << "Derived." };  }
void main() {
  BaseClass *ptr = new DerivedClass();
  ptr->print();       // Will print "Derived".

Notice the output is different. Look at virtual methods we have these advantages:

  • Allows run-time method selection.
  • Can be used to declare abstract methods (a method must write in any derived class).

However it's also worth mentioning the disadvantages:

  • Method is slower to invoke (first needs to lookup in a lookup table).
  • Requires a small amount of extra memory for the lookup table.

QUESTION 7: Explain what a virtual destructor is and solve an example (MED)

Above we explained a virtual method. A destructor (distinguished by '~') are used to remove a class object by cleaning up and deallocating memory. Virtual destructors have a bit of a trick to them however, as follows:

class Base                      
  Base()  { cout << "constructing BASE. "; }
  ~Base() { cout << "destroying   BASE. ";  }
class Derived : Base                             
  Derived()  { cout << "constructing DERIVED. "; }
  ~Derived() { cout << "destroying   DERIVED. "; }
void main() {
  Base *ptr = new Derived();        // Output: "constructing BASE.  constructing DERIVED. "
  delete (ptr);                     // Output: "destroying   BASE. "

As you can see, this can lead to memory leaks - none of the attributes in derived will get deleted. By using virtual destructors, both Derived and Base destructors will be called:

class Base                      
  Base()          { cout << "constructing BASE. "; }
  virtual ~Base() { cout << "destroying   BASE. "; }
...(as before)
void main() {
  Base *ptr = new Derived();        // Output: "constructing BASE.    constructing DERIVED. "
  delete (ptr);                     // Output: "destroying   DERIVED. destroying   BASE. "

QUESTION 8: Make a copy of a single linked list where each node contains a second pointer to a random other node (MED)

Writing a function to copy a linked list isn't too hard so we can write the code for that first part straight away. The real challenge here is the random pointer. Like most problems, it really helps to draw a diagram.

  [3] -> [2] -> [6] -> [8] -> NULL
   |     |_^    |       |______^
   |      ^_____+ ^

If you'll excuse my bad diagram you'll still see that each node not only has a "next" pointer, but also has a "randomNext" pointer which couple potentially point to: (a) itself, (b) null, (c) any node after it or (d) any node before it. Detecting for (a) and (b) is easy, but how can you tell where the random pointer points to?! The "brute force" solution would be to look at each node in order, then have a second pointer visit EVERY node to see if that's where randomNext points to, and then make the copy do the same. Since for each node we check ever other node this is O(n^2) - and thus very undesirable. What if we could at least order the values in the list so they read {1,2,3,4} - then we could then instantly see what index each randomNext points to - but in most problems we are told our input shouldn't be change or is immutable (can't be changed), and even if we could change the value back, it would still be a O(n) and pain to iterate through the new linked list to create a pointer. How then could we quickly reference the correct place? The solution I came up with is to store the memory addresses. This could be done in a vector, but if we wanted to leave the original list unchanged, what if we make a hash table which kept the memory address of each original node as its key, and as its value we could keep the corresponding memory address in the copy we are generating. In code this would look like this:

#include <hash_map>
#include <vector>
using namespace std;

struct Node
  int value;
  Node *next;
  Node *randomNext;
  Node(int val, Node *n, Node *rn) {
    value = val;
    next = n;
    randomNext = rn;

Node *copySpecialList(Node *aHead)
  if(aHead == NULL)
    return NULL;
  Node *bHead = new Node(aHead->value, NULL, NULL);
  Node *bPrev = bHead;
  Node *a = aHead;
  Node *b = NULL;
  while(a!= NULL)
     b = new Node(a->value, NULL, NULL);
     bPrev->next = b;
     bPrev = b;
     a = a->next;
  hash_map<*Node,*Node> map;
  a = aHead;
  b = bHead;
  while(a && b)
    map[a] = b;
    a = a->next;
    b = b->next;
  for(a=aHead, b=bHead; a && b; a=a->next, b=b->next)
    if(a->randomNext == NULL)
      b->randomNext = NULL;
      b->randomNext = map[ a->randomNext ];
  return bHead;

  • Big O analysis: time = O(n), space = O(n) (our hash table of memory addresses will likely have VERY few collisions and occupy ~the same space as our new list)

See Also