big data analytics project using spark or hive on aws

Results and Insights: After performing the above methodology, the Random Forest Classifier model was found to be the best model for predicting customer churn, with an accuracy of 95.3%. The model achieved a precision of 94%, a recall of 92%, and an F1 score of 93%. This indicates that the model can accurately identify potential churners while minimizing false positives. The most crucial features that contribute to customer churn were found to be contract type, tenure, monthly charges, and internet service. By identifying these factors, the telecommunications company can take proactive measures to retain its customers and improve its customer retention rates.

Conclusion: In conclusion, this project demonstrates how data analytics can be used to predict customer churn in a telecommunications company. By developing a predictive model, the company can identify potential churners


Vishnu Vardhan

1. Select food security data use of Map Reduce or Apache Spark to evaluate industry domain specific data analytic goals. For Map Reduce, this could be through direct means such as through Java or Python programs or indirect means such as through Apache Pig or Apache Hive. 2. You will also need to use the data analytic questions or goals for your project that are appropriate for the dataset you have chosen. These data analytic goals or questions can mirror or be similar to the questions posed and answered in the peer-reviewed papers you have researched for your literature survey paper for the domain surrounding your dataset. You can also find such domain-specific questions from published case studies or technical reports. Regardless, ensure the questions you pose are appropriate to the domain and valuable in their own right based on your research into the data science Map Reduce or Apache Spark related work conducted by the industry or institution most appropriate for your dataset. If you do this, then these questions will be well-suited for this project paper. 3. The methodology section should contain what you intend to do in order to answer these project questions and how you will analyze these results in the context of the dataset specific domain you haunchacan Submit 4. Conduct your Map Reduce and Apache Spark job on the AWS Hadoop Cluster. You should be familiar with the specific tool you wish to use through your previous homework assignments. You will need to implement the appropriate commands on the Cluster required to extract the insights from the specific features of the dataset provided on Canvas. 5. Analyze your results and formulate answers to those questions raised in the earlier part of your report based on the data you have extracted from the large data set as a result of your Map Reduce or Apache Spark job. 6. Write your insights into your conclusion section and incorporate the answers you have formulated and supported with the results of your work in the previous sections of this project report.

Powered by WordPress