Algorithm problems - prep set 2
Contents
About
Prep set 2
Below is a very small set of question to prepare for an interview.
QUESTION 1: Return a random subset of X integers from an array of integers (MED)
I was asked this in an interview and kind-of bombed it. My mistake was thinking it was a "warm up question" because it sounded so easy, and I'd solved the same problem recently in python by simply calling random shuffle then resize. Instead I should have applied my Programming Interview Strategy Sheet and I would have asked up front: "is the original array mutable", "how big can this array be", "am I memory bound", "what should I return for an out-of-bounds X value".
The quick solution is O(n) with "std::random_shuffle":
vector<int> randSubsetTheEasyWay(vector<int> v, const int x) {
// Check X is sensible value.
std::random_shuffle(v.begin(), v.end());
v.resize(x);
return v;
}
I suggested we could look at each element and give it (x_remaining / n_remaining) chance of selection, but actual that method isn't totally random so I abandoned it. Then I realize it could be a waste to shuffle/process all n element, if only x elements need shuffling... so I came up with:
vector<int> randSubset(vector<int> v, const int x) {
// Check X is sensible value.
x = std::max(x, v.size())
for(int i=0; i < x; i++) {
int randidx = rand(i, v.size() - 1); // Pick between remaining element in vector.
std::swap(randidx, x); // Swap the randomly picked element to the front.
}
v.resize(x);
return v;
}
I also pointed out that if x is greater than half then it is more efficient to randomly swap (n - x) element to the end and resize to x as normal. Both however, suffer from O(n) extra memory. What if we had a million elements and only needed to pick two? I know that we need a way to mark what's already selected, and I very briefly mentioned hash table, but my fault was I didn't program it. That was probably the answer my interviewer wanted, because it doesn't require O(n) extra space, only O(x) extra space and ~O(x) processing time. That would look like this:
vector<int> randSmallSubset(const vector<int>& v, const int x) {
// Check X is sensible value.
std::unordered_set<int> subsetIndexes;
for(int i=0; i<x;) {
int randidx = rand(0, v.size()); // Pick between all element in vector.
if (subsetIndexes.find(randidx) == subsetIndexes.end()) { // Check for collisions.
setOfStrs.insert(randidx);
i++;
}
}
vector<int> returnV(x);
for (const int& idx : subsetIndexes)
returnV.pushback(v[idx]);
return returnV;
}
So now we have a O(x) solution or O(x log x) if we used a set (binary search tree). However it all depends on the size of x relative to n... so really you'd want a big if statement to help decide! Ultimately there is no single solution for this question I believe - you have to just discuss the benefits of each. A final solution might look something like this then!:
vector<int> randSubsetAdvanced(const vector<int>& v, const int x) {
// Check X is sensible value:
if (x <= 0) { vector<int> empty_vector; return empty_vector; }
if (x >= v.size()) { vector<int> identical_vector(v); return identical_vector; }
// Determine best subroutine:
if (x < n / 2) {
return randSmallSubset(v, x);
} else {
return randSubset(v, x)
}
}
QUESTION 2: What are some of the changes in C++11
This probably isn't something you memorize, but maybe you use a few:
- nullptr:
nullptr
- replaces bug prone NULL macro. - defaulted & deleted functions:
struct A { A()=default; virtual ~A()=default; };
struct NoCopy { NoCopy & operator =( const NoCopy & ) = delete; NoCopy ( const NoCopy & ) = delete; }; NoCopy a; NoCopy b(a); //compilation error, copy ctor is deleted
- automatic type deduction:
auto
(great for loops etc). - Standard Library changes:
- smart pointers:
auto_ptr
,shared_ptr
,unique_ptr
. - new algorithms:
all_of()
,any_of()
,none_of()
andcopy_n()
- new standard data structures:
unordered_set
,unordered_map
,unordered_multiset
, andunordered_multimap