Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?


Currently submitted to: Interactive Journal of Medical Research

Date Submitted: Jul 1, 2020
Open Peer Review Period: Jul 1, 2020 - Aug 26, 2020
(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Warning: This is a unreviewed preprint (What is a preprint?). Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn (a note "no longer under consideration" will appear above).

Peer-review me: Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period (in this case, a "Peer-Review Me" button to sign up as reviewer is displayed above). All preprints currently open for review are listed here. Outside of the formal open peer-review period we encourage you to tweet about the preprint.

Citation: Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author).

Final version: If our system detects a final peer-reviewed "version of record" (VoR) published in any journal, a link to that VoR will appear below. Readers are then encourage to cite the VoR instead of this preprint.

Settings: If you are the author, you can login and change the preprint display settings, but the preprint URL/DOI is supposed to be stable and citable, so it should not be removed once posted.

Submit: To post your own preprint, simply submit to any JMIR journal, and choose the appropriate settings to expose your submitted version as preprint.

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Covid-19 Pandemic: Data Analysis and Forecasting using Machine Learning Algorithms

  • Sohini Sengupta; 
  • Sareeta Mugde; 



India reported its first Covid-19 case on 30th Jan 2020 with no practically no significant rise noticed in the number of cases in the month of February but March2020 onwards there has been a huge escalation as has been the case with like many other countries the world over. This research paper analyses COVID -19 data initially at a global level and then drills down to the scenario obtained in India. Data is gathered from multiple data sources- several authentic government websites. Variables such as gender, geographical location, age etc. have been represented using Python and Data Visualization techniques. Getting insights on Trend pattern and time series analysis will bring more clarity to the current scenario as analysis is totally on real-time data(till 19th June). Time Series Analysis and other pattern-recognition techniques are deployed to bring more clarity to the current scenario as analysis is totally based on real-time data(till 19th June,2020) Finally we will use some machine learning algorithms and perform predictive analytics for the near future scenario. We are using a sigmoid model to give an estimate of the day on which we can expect the number of active cases to reach its peak and also when the curve will start to flatten. Strength of Sigmoid model lies in providing a count of date –this is unique feature of analysis in this paper. We are also using certain feature engineering techniques to transfer data into logarithmic scale for better comparison removing any data extremities or outliers. Certain feature engineering techniques have been used to transfer data into logarithmic scale as is affords better comparison removing any data extremities or outliers. Based on the predictions of the short-term interval, our model can be tuned to forecast long time intervals. Needless to mention there are a lot of factors responsible for the cases to come in the upcoming days. One factor being extent of adherence to the rules and restriction imposed by the Government by the citizens of the country.


Prediction of the number of positive covid cases in the next few months .


Machine Learning Model - Clustering Sigmoid Model


The model predicts maximum active cases at 258846. The curve flattens by day 154 i.e. 25th September and after that the curve goes down and the number of active cases eventually will decrease.


There are a lot of research works going on with respect to vaccines, economic dealings, precautions and reduction of Covid-19 cases. However currently we are at a mid-Covid situation. India along with many other countries are still witnessing upsurge in the number of cases at alarming rates on a daily basis. We have not yet reached the peak. Therefore cuff learning and downward growth are also yet to happen. Each day comes out with fresh information and large amount of data. Also there are many other predictive models using machine learning that beyond the scope of this paper. However at the end of the day it is only the precautionary measures we as responsible citizens can take that will help to flatten the curve. We can all join hands together and maintain all rules and regulations strictly. Maintaining social distancing, taking the lockdown seriously is the only key. This study is based on real time data and will be useful for certain key stakeholders like government officials, healthcare workers to prepare a combat plan along with stringent measures. Also the study will help mathematicians and statisticians to predict outbreak numbers more accurately.


Please cite as:

Sengupta S, Mugde S

Covid-19 Pandemic: Data Analysis and Forecasting using Machine Learning Algorithms

JMIR Preprints. 01/07/2020:22004

DOI: 10.2196/preprints.22004


Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.