Reinforcement learning algorithms have a dependency on observing rewards for actions taken. The relaxed setting of having fully observable rewards, however, can be infeasible in certain scenarios, due to either cost or the nature of the problem. Of specific interest here is the challenge of learning a policy when rewards are delayed, aggregated, and anonymous (DAAF). A problem which has been addressed in bandits literature and, to the best of our knowledge, to a lesser extent in the more general reinforcement learning (RL) setting. We introduce a novel formulation that mirrors scenarios encountered in real-world applications, characterized by intermittent and aggregated reward observations. To address these constraints, we develop four new algorithms: one employs least squares for true reward estimation; two and three adapt Q-learning and SARSA, to deal with our unique setting; and the fourth leverages a policy with options framework. Through a thorough and methodical experimental analysis, we compare these methodologies, demonstrating that three of them can approximate policies nearly as effectively as those derived from complete information scenarios, albeit with minimal performance degradation due to informational constraints. Our findings pave the way for more robust RL applications in environments with limited reward feedback.