Phylogeny based target Identification and antiviral drugs against Influenza A virus PRAGYA ANAND ABSTRACTInfluenza is disease with no stable treatment since, 1918 and 2018 is remarked as century of influenza. Every year cases of morbidity and mortality due to influenza are reported globally. US-CDC, WHO and health partners suggested that up to 650,000 deaths associated with respiratory disease linked to seasonal flu each year. Influenza A virus with unique capability of infected broad range of hosts including birds of more than 100 species of different types including wild and domestics birds, domestic animals (pigs, equine, canines etc.) including humans too. Due to e adaptation procedure remainsununderstood pattern due high mutation rate of influenza virus.Influenza A virus shows this huge range of hosts variation it infects and adapts to another with more fatal outcomes and there is always a risk of unknown cause of infection which can result in destructive loss of human- kind, economic failure and to save these resources form unknown influenza virus slaughter of domestic birds and swine remains an option to save larger part of human population. Influenza A virus genome consist of 8 RNA segments which code for around 12-17 proteins during its replication and translation procedure in the host cell. Out of which polymerase basic 2, polymerase basic 1, polymerase acidic, nucleoprotein, hemagglutinin, neuraminidase, matrix protein1 and matrix protein 2 participate in cellular composition of the virus, other proteins are auxiliary, having role in export, import, protecting virionsfrom host immune system and other activities and for few proteins function are still not clear (which are translation productions of different frameshifts of the segment PB1, PA, M and NS segments). Thus, target major protein which donâ??t have role interacting with host cellular mechanisms will be positive approach. Influenza virus is categorised as moving target, in this workfunctionally important protein is used as target for which Multiple Sequence Alignment was performed for finding the conserved protein and structural alignment was performed which indicated overall Nucleoprotein can be a potent drug-able target and it is also most abundant protein produced in the virus and it also directly interacts with the genome of the virus. The global infection conditions suggested that A/H3N2 virus subtype in the recent years had poor impact thus H3N2 subtype of Influenza A virus was targeted. Further, drug screening was performed against nucleoprotein of influenza virus resulting in synthetic compounds apart from nucleoprotein, Neuraminidase inhibitor class of drugs are FDA approved that are oseltamivir, zenamivir for which almost all strains of Influenza H3N2 virus are resistant. Since, 3D structure for H3N2 subtpe was not present thus docking studies were performed on modelled structure. The modelled protein was validated by Ramachandran plot analysis, Z- Score validation, and ERRAT graphical analysis of residue energy optimisation. In this work natural compound based screening of compounds was performed of which CID 10836928 showed good binding affinity which also qualified the CMC drug like properties, and was stable against Lipinski Rules of Five. ADMET properties were checked using preADMET server tool CaCo2 cell permeability was moderate and HIA (Human Intestinal Absorption) was high i.e. above 90% and also passed plasma protein binding (PPB). Then toxicity prediction showed that lead like compound is non mutating nor carcinogenic. The ligand screening library consisting of 95 natural compoundsand their derivatives/ analogues were sort listed of which the best CIDs were 6419887 (Hecogenin), 10836928 (Muzanzagenin), 11385250 (Asparacosin A), 49797769, 46881227, 107971 and 73607 which showed good binding affinity against homology modelled protein of Nucleoprotein. Since, the virus is highly mutating so current anti-viral strategies (i.e. vaccine or antiviral drugs) fail to combat with the current circulating strains. The updating and production procedure of Influenza Vaccine like Candidate is time consuming procedure and huge capital is invested annually but still chances of failure of vaccines remain high. Thus, there is a requirement for finding a more potent curable therapy which apart from vaccines as more stable cure. This work presents target identification and sequence stability as compared to available 3D structures of the different proteins in PDB database and docking analysis which resulted PubChem ID 10836928 (Muzanzagenin) as drug-like compound with higher binding affinity which was achieved from Aotudock scoring function. The protein- ligand interaction analysis showed good interaction between them proposing more reliable molecule against Influenza virus. CHAPTER 11 IntroductionInfluenzaInfluenza is an acute respiratory infection (Paules et al., 2017, Krammer et al., 2018) caused by influenza viruses (Ahmad et al., 2015) which circulate in all parts of the world.Influenza is a contagious illness caused by influenza viruses that infect the nose, throat, and sometimes the lungs. It can cause mild to severe illness, and at times can lead to death due to complications. The best way to prevent the flu is by getting a flu vaccine each year and by also practicing hygiene. According to the World Health Organization (WHO), influenza epidemics occur annually and estimated to result in 3 to 5 million sever cases, and approximately 250,000 to 500,000 deaths worldwide (www.who.int/mediacentre/factsheets/fs211/en/) but new estimates suggested by US-CDC, WHO and GISRS (Global Influenza Surveillance and Response System) that up to 650,000 people died due to respiratory diseases linked to seasonal flu which marked increase on previous global estimates of 250,000- 500,000 deaths from over 10 years (http://www.who.int/influenza/news-events/en/). The below Upper Respiratory System (α-2, 6 bonds) Lower Respiratory System (α-2, 3 bonds) Invasion of Influenza virus Fig.1 Infected regions due to influenza virus Influenza spreads around the world in a yearly outbreak, resulting in loss of life and economy. In the Northern and Southern parts of the world, outbreaks occur mainly in winter while in areas around the equator outbreaks may occur at any time of the year. It affects colder country more as compare to tropical and temperate region cause influenza infection is believed to infect faster in a colder season (WHO Factsheet). 1.2 The PathogensThere are 4 types of seasonal influenza viruses, types A, B, C and D. Influenza A and B viruses circulate and cause seasonal epidemics of disease. Influenza A viruses are further classified into subtypes according to the combinations of the hemagglutinin (HA) and the neuraminidase (NA), the proteins on the surface of the virus. Currently circulating in humans are subtype A(H1N1) and A(H3N2) influenza viruses. The A(H1N1) is also written as A(H1N1)pdm09 as it caused the pandemic in 2009 and subsequently replaced the seasonal influenza A(H1N1) virus which had circulated prior to 2009. Only influenza type A viruses are known to have caused pandemics. Influenza B viruses are not classified into subtypes, but can be broken down into lineages. Currently circulating influenza type B viruses belong to either B/Yamagata or B/Victoria lineage. Influenza C virus is detected less frequently and usually causes mild infections, thus does not present public health importance. Influenza D viruses (Lucas et al., 2016) primarily affect cattle and are not known to infect or cause illness in people.Signs and Symptoms of influenza infectionInfluenza (also known as the flu) is a contagious respiratory illness caused by flu viruses. It can cause mild to severe illness, and at times can lead to death. The flu is different from a cold. The flu usually comes on suddenly. People who have the flu often feel some or all of these symptoms: Fever or feeling feverish/chills, Cough, Sore throat, Runny or stuffy nose Muscle or body aches, Headaches, Fatigue (tiredness) Some people may have vomiting and diarrhea, though this is more common in children than adults. 1.3DiagnosisThe influenza diagnosis in clinically done easily during low influenza activity and epidemic situations but while controlled situations, the infection of other respiratory viruses e.g. rhinovirus, respiratory syncytial virus, parainfluenza virus and adenovirus also present Influenza- like Illness (ILI) due to which differentiation of influenza virus from other pathogens becomes difficult. The collection of appropriate sample (throat swab or nose ooze) and diagnostic test is required for definitive diagnosis. Proper collection, storage and transport of respiratory specimens is the essential step for detection of influenza infection. Laboratory confirmation from throat, nasal and nasopharyngeal secretions or tracheal aspirate or washings is performed for antigen detection, virus isolation or detection of influenza- specific RNA by reverse transcriptase- polymerase chain reaction (RT-PCR). Various guidance on the laboratory techniques is published and updated by WHO.1.3 Flu ComplicationsMost influenza cases recoveron its own in a few days to less than two weeks when the influenza infection lies in the upper respiratory tract (URT), but in some cases complication might develop to as serious as pneumonia which can be caused by influenza virus alone itself or co-infection of virus and bacteria which can be life-threatening and result in death. Other possible serious complications triggered by flu are myocarditis (inflammation of the heart), encephalitis (brain inflammation) or myositis, rhabdomyolysis (muscle tissues inflammation) and multi-organ failure (e.g. respiratory and kidney failure). A flu infection can trigger an extreme inflammatory response in the body and can lead to sepsis, the bodyâ??s life-threatening responses to infection. It can lead to chronic problem worsen e.g. people with asthma can suffer an asthma attack while flu infection. Other complications of flu include bacterial pneumonia, ear infections, sinus infections (moderate infection), and worsening of chronic medical conditions, such as congestive heart failure, asthma, or diabetes. Flu is infectious to any age group people but serious problem can happen with mainly people 65 years and above, people with chronic medical conditions (like asthma, diabetes or heart disease), pregnant women, and young children. 1.4 Symptomatic differences between common cold and influenza The diagnosis of flu infection based on symptoms becomes difficult and also confused with minor or major other respiratory infections but to distinguish flu infection with cold is important. The table 1 shows the differences which can help layman people to differentiate flu from cold and get adequate treatments with in time duration. Table 1 Showing differences in the symptoms of flu and common cold Signs and Symptoms Influenza Cold Symptom onset Abrupt Gradual Fever Usual; lasts 3-4 days Rare Aches Usual; often severe Slight Chills Fairly common Uncommon Fatigue, weakness Usual Sometimes Sneezing Sometimes Common Stuffy nose Sometimes Common Sore throat Sometimes Common Chest discomfort, cough Common; can be severe Mild to moderate; hacking cough Headache Common Rare Influenza spreads around the world in a yearly outbreak, resulting in about three to five million cases of severe illness and about 250,000 to 500,000 deaths. In the Northern and Southern parts of the world, outbreaks occur mainly in winter while in areas around the equator outbreaks may occur at any time of the year. Death occurs mostly in the young, the old and those with other health problems. Larger outbreaks known as pandemics are less frequent. In the 20th century, three influenza pandemics occurred: Spanish influenza in 1918 (~50 million deaths), Asian influenza in 1957 (two million deaths), and Hong Kong influenza in 1968 (one million deaths). The World Health Organization declared an outbreak of a new type of influenza A/H1N1to be a pandemic in June 2009. Influenza may also affect other animals, including pigs, horses and birds. 1.4 TransmissionPrevious data have shown that the novel H1N1 is no longer endemic in swine populations and is seen to infect among humans too. Earlier, it was known to spread in pigs and the people in contact with them, but the later interspecies transmission was reported by. The transmission of the different strains of influenza is through the air and so, the flu spread throughout the world due to the migratory population. Fig. 2.9 representing the host range of influenza A Virus and also showing the type of the strains infecting the particular host and crossing the host barriers range. The interspecies transmission is shown in the form of solid lines (direct transmission) and dotted arrows as sporadic or limited infection. Source: The ecology and adaptive evolution of influenza A interspecies transmission (Joseph et al., 2016). Influenza remains active and anyone in the vicinity can catch the infection. There are two types of transmission routes: short-range transmission and long-range transmission (Peterandral et al., 2016) according to the distance between the patient and the susceptible individual, e.g. exhalation from patient to air and inhalation of infected air to susceptible population – short range route and populations between distant locations- long-range routes. The mammalian influenza virus can stay alive in mucus for several hours, while an avian influenza virus can survive for more than a hundred days (Spickler A. 2016). 1.5 Current ProphylaxisPatients with uncomplicated seasonal influenza are managed with symptomatic treatment and advised to stay at home to minimize the risk of spreading to others in the community. Patients should seek medical attention with deteriorating condition. Patients of high risk should be treated with antivirals in addition to symptomatic treatment soon after infection. Patients with severe or progressive clinical illness associated with suspected or confirmed influenza virus infection should be administered with Neuraminidase inhibitors (i.e. oseltamivir) should be prescribed within 48 hours of symptom onset for 5 days or till satisfactory clinical improvement. All current circulating influenza viruses are resistant to adamantane antiviral drugs (such as amantadine and rimantadine) so they are not recommended. Immunity for new viruses is not developed thus in every season, flu shot is recommended which is inactivated influenza vaccines are most widely used. Vaccines recommend for upcoming seasons are quadrivalent composition (http://www.who.int/influenza/vaccines/virus/recommendations/2018_19_north/en/) an A/Michigan/45/2015 (H1N1)pdm09-like virus; an A/Singapore/INFIMH-16-0019/2016 (H3N2)-like virus; a B/Colorado/06/2017-like virus (B/Victoria/2/87 lineage); and a B/Phuket/3073/2013-like virus (B/Yamagata/16/88 lineage). Every year WHO and the collaborative health organization with the support of globally specimens collected are analyzed to predict the vaccine strain for upcoming seasons which is time taking procedure and predictions canâ??t be fully reliable, as in 2009 pandemic occurred due to failure of predicted vaccine. In 2017 also the flu vaccines were less effective. Many strategies are proposed to develop Universal Flu Vaccine based on targeting the conserved region of the flu vaccines which can cover wider types of flu viruses. 1.5 Current Impact of Influenza Globally and in IndiaGlobally Flu Tracked Fig. 2 Pictorial representation of Influenza infection occurred globally report estimated by WHO reports. Source: https://hygimia69.blogspot.in/2017/03/influenzaglobal-update-no-284-based-on.htmlThe global report of influenza tracked in 2017 as shown above the regions covered in different shades of yellow the infected regions of the world the darker the shade the more infections status in the 2017 and lighter the region the lesser the infection occurred in those regions while the pie charts show the type of influenza virus which have affected particular regions with specific type of flu virus.The WHO data suggests that in 2017 North American countries and northern Asia region showed high flu activity in the Flu season. But milder flu activity was observed in southern regions including India also. Flu Statistics in India In India, the National Centre for Disease Control maintains the overall statistics of infectious and other diseases, where the data from all over India is submitted monthly and annually. Fig.3 Graphical representation of Influenza reported in India from 2010 to 2017. The blue line shows the cases of hospitalization and the red line represent the dead due to Influenza. In India since 2010 to 2017 the graphical view of cases due to influenza morbidity reported was highest in 2015 and gradually decreased in 2016 but in 2017 the influenza infection revert back with higher number of death cases and hospitalization as compare to previous year. The data were collected from”Integrated Disease Surveillance Programme” Ministry of Health & Family Welfare, Government of India Designed and Developed by Center for Health Informatics. The data for 2018 is pending. The blue line indicate the cases reported against flu and hospitalization due to flu and the red line shows the death cases in India due to flu . Death cases from 2016 to 2017 increased from 265 to 2186 till November â??17. ObjectivesMultiple Sequence Alignment based target identification and validation by sequence and structural alignment Sequence based homology modelling and validation of the modelled structure Screening of available drugs against selected target of Influenza A virus Literature based screening of antiviral natural compounds and ligand library preparation of derivatives and analogues Molecular Docking studies ADMET property predictions of lead like compounds Protein- ligand interaction analysis with lead compound CHAPTER 2Review of literature2.1 InfluenzaInfluenza is a contagious disease caused by the influenza virus which belongs to orthpmyxoviridae family (Velthuis et al., 2016) (orthos means straight and myxa states mucosa) (https://en.wikipedia.org/wiki/Orthomyxoviridae). The family contains genera like InfluenzaAvirus, InfluenzaBvirus, InfluenzaCvirius, InfluenzaDvirus (Lucas et al., 2016,Anitha et al., 2017), thogotovirus, isavirus. Influenza A virus and Influenza B virus are mainly disease-causing among humans, and animals. Influenza occurs either as epidemics (annually or seasonally) or as pandemics (globally) (Peteranderl et al., 2016.). This infection is not limited to humans rather the main host of the influenza A virus are the water birds like then the infection is asymptomatic from which the virus spreads to the domestic and poultry birds (which is symptomatic in case of High Pathogenic Avian Influenza Virus (HPAIV) (subtypes including H7N9, H5N1, H5N6) while asymptomatic is case of Low Pathogenic Avian Influenza Virus (LPAIV) (including subtypes H7N2) which are referred to as Avian Influenza or commonly called as Bird Flu (CDC). Apart from birds and Human influenza virus is having a broader range of host coverage including vertebratesâ?? viz., pigs (Swine Flu- H1N2v and H3N2v), horse (Equine Flu- H7N7, H3N8), dog (Canine Flu- H3N8), cats (feline flu- H1N1), seals, whales etc. Thus, influenza can be enzootic (disease affecting animals regularly of a particular area in a particular season) or zoonotic (disease that can be transmitted from animal to people) too (WHO). Influenza is referred to as winter season disease (Peteranderl et al., 2016) due to believed WHO fact that mucosal layer of the respiratory tract gets little less moisturised thus chances for influenza virus to bind with the receptor is higher. Epidemics season in temperate region (Northern Hemisphere: December to April; Southern Hemisphere: June to September) low humidity are suggested to prolong virus shedding (Peteranderl et al., 2016). In tropical and subtropical regions flu season is not clearly defined and may happen to recurrent infections. Every year the flu season for both the hemisphere is higher while winter season but for the tropical zone the minor influenza is observed throughout the year. Influenza virus, was discovered in 1931 in pigs by Richard Schope and later on in humans (group headed by Patrick Laidlaw at the Medical Research Council of the United Kingdom in 1933.). The most lethargic waves of influenza pandemics (WHO history of Influenza) were around 4 pandemics occurred these are 1918- Spanish Flu (H1N1) 1957- Asian Flu (H2N2) 1968- Hong Kong Flu (H3N2) 2005- Bird Flu (H5N1) 2009- Swine Flu (H1N1) 2013- Bird Flu (H7N9) Asian flu of 1957 was most lethargic with almost more than, people die in World War 1st. The documented data about flu history is available after the discovery of Influenza virus in pig in 1931 and in 2018 hundred years of influenza are over and the available treatments of influenza in 2017 were not very effective as reported by WHO statistics and news bureau. 2.2Influenza VirusInfluenza virus is an infectious agent belonging to groupfifth of the virus taxonomy (https://viralzone.expasy.org/223?outline=all_by_protein) classification called as orthomyxoviridae family (Shen et al., 2015, Liu et al 2016). The family comprises genus as â??Influenzavirus A, Influenzavirus B, Influenzavirus C (Dawson et al 2017), Influenzavirus D (Krammer et al., 2018), Thogotovirus, Isavirus and Quaranjavirusâ? (https://viralzone.expasy.org) which are negative sensesingle-stranded RNA viruses (Dawson et al., 2017) in a double helical conformation (Huang S. 2014). The three main species: Influenza A virus (IAV), Influenza B virus (IBV) and Influenza C virus (ICV) which cause infection among humans out of which Influenza A viruses cause the most virulent infection among the humans and are also a common cause of zoonotic infections ( nature.com). Influenza viruses cause acute respiratory (Paules et al., 2017) infection (commonly caused as â??influenzaâ?? or â??fluâ??) among vertebrates. The flu infection has been recognised since the 16th century (Paules et al., 2017) and spreads rapidly as outbreaks in the two forms globally: epidemic (seasonal or intrapandemic) caused by influenza A and B viruses and sporadic pandemic caused by influenza A viruses (Paules et al., 2017, Krammer et al., 2018). Among the humans and other vertebrates the symptoms associated with influenza virus infection vary from a mild respiratory disease confined to respiratory tract and characterised by fever, sore throat, runny nose, cough, headache, muscle pain and fatigue to sever in some cases lethal pneumonia caused by influenza virus or associated with secondary bacterial pneumonia of the lower respiratory tract. The influenza infection is more prone to pregnant women, sick and elderly people and to children below age of 2 years. Influenza virus infection can also lead to a wide range of non- respiratory complications in cases â?? affecting the heart, central nervous system and other organ system (Krammer et al., 2018). Among the birds, the infection can be symptomatic or asymptomatic in the form of Highly Pathogenic Influenza Virus (HPIV) infection or Low Pathogenic Influenza Virus (LPIV) infection respectively with are of zoonotic origin(Joseph et al., 2016). According to the Nature review article of 2016 the genome of the influenza A virus is not understood properly and the structure of the genome is not known. The predicted arrangement of the genome of the virus is in the form of (7+1) genome (Dawson et al., 2017), which indicates that influenza A virus has genome sets of 8 negative sense single-stranded RNA (NSSRv) while other genera like influenza B virus and Influenza C virus has genome set of 7 negative sense RNA. Influenza A virus is most infectious due to its high rate of variation in the antigenic proteins that are Hemagglutinin and Neuraminidase proteins, which are membrane glycoproteins. Due to antigenic variations there are possibly reported hemagglutinin (1to18 HA types) and neuraminidase (1 to 11 NA types), these variations are due antigenic drift or antigenic shift of the viral genome while replication due to lack of proof reading capacity of the viral polymerase and due to inter- mixing of the different host viruses viz., a virus from avian type and a virus from Homo sapiens can interchange and mix the genome within the mixing vessel i.e. Pig (Sus scrofa) respectively. The result of antigenic shift may lead to new pandemic which was more fatal than antigenic drift which results in seasonal epidemics due to no memory in the immune system from the previous circulating strain. The discoveries suggest a very complex mechanism of influenza virus evolution for which only little is known. Fig Structure of influenza virus showing major components Subtype NomenclatureInfluenza A virus naming is internationally accepted for which convection was accepted by WHO in 1979 and published in 1980 in Bulletin of the World Health Organization, which is followed globally. The approach uses the following components: The antigenic type (e.g., A, B, C) The host of origin (e.g., swine, equine, chicken, etc. For human-origin viruses, no host of origin designation is given.) Geographical origin Strain number Year of isolation For influenza A viruses, the hemagglutinin and neuraminidase antigen description in parentheses (e.g., (H1N1), (H5N1) Fig Nomenclature system of Influenza Virus 2.3 Molecular Structure of the influenza virusUnder electronic microscope the influenza viruses appears spherical but filamentous shape also is also reported. The size of the virus is 80 to 120 nm. Genomic material Spike like membrane proteins studded in envelope Lipid Envelope M2 Protein Credit: Photo : courtesy Yoshihiro Kawaoka Influenza virus section under electronic microscope The influenza virus in an enveloped virus containing segmented negative sense single stranded RNA genome in a double helical conformation (Shih .S et al 2014). Till date, three species (or types) of influenza have been identified: influenza A, B, and C, all of which can cause disease in human but the dominant species causing human disease are influenza A and B (WHO). Influenza A and B species are similar in the genomic structure which is composed of eight RNA segments, whereas in influenza C, the genome is composed of seven RNA segments (Shis .S et al 2014) For influenza A, the eight RNA segments encode at least 15 viral proteins: the envelope glycoproteins (hemagglutinin HA and neuraminidase NA), proton channel protein (M2), RNA dependent RNA trimeric polymerase complex (full length PB1, PB2, and PA and five shorter protein variants PB1-F2, PB1-N40, PA-X, PA-N155, and PA-N182), matrix protein (M1), nucleoprotein (NP), nuclear export protein (NEP, originally named NS2), and nonstructural protein (NS1) (Shis S.et al. 2014). Due to the limited coding capacity of the influenza genome, the virus employs several strategies such as alternative splicing of precursor viral mRNA, leaky ribosomal scanning, and ribosomal frame-shifting processes to synthesize these viral proteins. The proteins have also emerged to be multi-functional to facilitate viral infection cycle and maintain integrity of virion structure. In contrast to influenza A species, only 11 viral proteins (the same eight full length counterparts with three shorter protein variants BM2, NEP, and NB) have been identified in influenza B species (Shih et al.2014). Nine viral proteins (the same six full length counterparts and hemagglutinin-esterase fusion HEF, the HA and NA counterpart with two shorter variants CM2 and NEP) have been identified in influenza C species (Hause et al. 2013; Matsuzaki et al. 2003; Nakada et al. 1986; Alamgir et al. 2000). It is possible that more shorter-sized protein variants exist in influenza B and C types as in influenza A virus but further studies need to be done to determine this. In summary within the influenza virion architecture, HA, NA, (or HEF in influenza C virus) and M2 (and NB in influenza B virus) are envelope proteins with transmembrane domains anchored into the lipid envelope, underneath which lies M1 proteins that connect the viral genome and internal proteins to the surface proteins. Each RNA segment is encapsidated with multiple NP proteins that form a complex with polymerase complex (PB1, PB2, and PA) termed viral ribonucleoprotein (vRNP). Small amounts of NEP are located in the viral particle and is thought to interact with M1 which anchors all the structural components of the virus (Paterson and Fodor 2012). Other viral proteins such as NS1 and the shorter polymerase variants such as PB1-F2 are translated during viral infection in the host cell but not packaged into the virion(Shaw et al., 2011). Table 3 portrays the comparison between four types of influenza virusesâ??. 2.3.1Comparison among different Influenza virus typeTable 2 specifying differences among type A, B, C and D Character IAV IBV ICV IDV No. of genomic segments 8 8 7 7 Protein peptide symbols PB2, PB1, PB1-F2, PA, PA-X, HA, NP, NA, M1, M2, NS1 and NS2 PB2, PB1, PA, HA, NP, NA, NB, M1, BM2, NS1 and NS2 (Krammer et al., 2018) PB1, PB2, P3, HE, NP, M1, CM2, NS1 AND NS2 (NCBI Flu DB) – Common functions Three membrane proteins (HA, NA, M2), a matrix protein (M1) just below the lipid bilayer, a ribonucleoprotein core (consisting of 8 viral RNA segments and three proteins: PA, PB1, PB2), and the NEP/NS2 protein Four proteins in the envelope: HA, NA, NB, and BM2. Like the M2 protein of influenza A virus, the BM2 protein is a proton channel that is essential for the uncoating process. The NB protein is believed to be an ion channel, but it is not required for viral replication in cell culture A minor viral envelope protein is CM2, which functions as an ion channel. Envelope glycoprotein is called HEF (hemagglutinin-esterase-fusion) because it has the functions of both the HA and the NA – Host range >100 of bird species, animals, human, bat etc. (Sautto et al., 2018) Only humans (Sautto et al., 2018) Human and pigs (asymptomatic or very mild infection) (Sautto et al., 2018) Reported among cattle and not known to infect human (Sautto et al., 2018) Type H (1- 18) and N (1- 11) No subtypes but divided into lineages and strains: B/ Victoria and B/ Yamagata – – spectrum of disease Unpredictable coverage among birds, Same as IAV but do not cause pandemics limited host range of the virus â?? humans and seals â?? which limits the generation of new strains by re-assortment 2.3.1Replication and Infection CycleThe viruses need a host system to initiate their life cycle, likewise influenza virus being negative sensesingle-stranded RNA virus so influenza virus replication occurs in the nucleus of the host cell (Wang et al., 2014). Fig. 2.4 Replication cycle of influenza virus along with transcription and translation. Source: https://www.nature.com/articles/s41572-018-0002-yNA has also been postulated to aid in the initial internalization of the virus by the removal of decoy receptors on mucins and cilia in the extracellular region of epithelial cells in the respiratory system (Matrosovich et al. 2004b; Su et al. 2009) and to enhance the movement of the virus bound to the cell surface to suitable sites for endocytosis (Ohuchi et al., 2006). Fusion of the membranes does not allow the release of vRNPs to the cytoplasm if they do not dissociate from M1. Within the endosomes, the acidity in the environment prompts the proton channel M2 to pump hydrogen ions to the interior where the acidity disrupts the interaction of vRNPs to M1 and allows the introduction of the viral genome to the cell cytoplasm upon membrane fusion (Shaw 2011; Bouvier and Palese 2008). The NB protein of influenza B virus also forms an ion channel but the precise role during the infection cycle is still not clear. It has been suggested that it does not provide the function of acidification to the virion (Liang and Li 2010), although it is indispensable for efficient viral replication in mice (Hatta and Kawaoka 2003). Once liberated, vRNPs are transported into the nucleus through the interaction of NP with several cellular proteins and begin viral transcription and replication. Influenza, unlike most other vertebrate RNA viruses that replicate their genomic material in the cytoplasm, requires a cellular nucleus to undergo viral transcription and replication (Hutchinson and Fodor 2012). This unique feature enables influenza to expand the genomic coding capacity by exploiting the nuclear alternative splicing machinery to generate transcripts for M2 and NEP and this feature also contributes to influenza virulence (will be elaborated in the later section). It is thought viral replication follows transcription and that the newly synthesized viral genome is coated with newly synthesized NP to evade detection by intracellular pattern recognition receptors such as retinoic acid-inducible gene 1 (RIG-I). 2.3.2 Proteins of Influenza A Virus and their Functions (Velthuis et al., 2016 and https://www.uniprot.org/uniprot/?query=INFLUENZA+A+VIRUS+&sort=score) Table 2.2 Important Influenza Proteins and Functions Viral RNA-dependent RNA polymerase Viral RNA-dependent RNA polymerase (vRdRP) binds to the ends of each (-) ssRNA segments and synthesize new copies of the viral RNP. The subunits combine together after import into the nucleus. The vRdRP consists of three main subunits (PB1, PB2 and PA) and peripherally a matrix of NP. PB2 759 PB2-S1 508 Responsible for cap binding. The globular domain is essential for proper association with host importin protein. An alternative splicing product of PB2, appears to localize in the mitochondria and inhibit the RIG-I-dependent interferon signaling pathway (humoral immunity). PB1 757 PB1-F2 87â??90 PB1-N40 Involved in capturing the cap regions of the hostâ??s mRNA and inserting the primer into the viral mRNA. Holds the polymerase active site and harbors endonuclease activity. From an alternative reading frame. Can induce apoptosis, regulate host interferon response and modulate susceptibility to bacterial superinfection May influence intracellular localization of PB1 N-terminally truncated version of PB1; a product from an in-frame downstream initiation site. The function is unknown, but it might modulate virus-induced pathogenesis. PA 716 252 PA-X 61 PA-N155 561 PA-N182 534 Functions as an RNA-endonuclease. Cleaves capped RNA fragments off of the hostâ??s pre-RNA to be used as primers for constructing viral mRNA Frameshifted PA at 191-252 (H7N7), postulated to play an important role in virus replication and shutdown of host innate responses in animal models, but its expression during in vivo infection has not been observed. Ribosomal frameshift to AUG start codon at position 155; possible role in viral replication, but function unknown Ribosomal frameshift to AUG start codon at position 182; function unknown Hemagglutinin The outer glycoprotein that binds sialic acid of epithelial cells and plays a central role in the fusion process Nucleoprotein Binds the ssRNA into a large ds (NP protein) helix and serves to regulate the export and import of viral RNPs Neuraminidase Helps the virion cut through the mucous coating of epithelial cells. Also thought to be important during the budding process where the newly forming virion breaks away from the host cell M1 252 M2 97 M3 9 M4 54 M42 99 Full-length structure. Involved in regulating the import and export of the viral RNP. A key regulator for viral assembly, preferentially binding viral RNPs during viral assembly Alternative splice produce. Combines in the form of a tetramer in the viral envelope where it regulates the flow of protons into the viral genome after the capsid has entered the cell and before release of the viral RNPs (endocytosis) Alternative splice product; function unknown Alternative splice product; function unknown Alternative splice product; function not fully established; however, it can serve in the place of M2 NS1 230 NEP (formerly NS2) 98 NS3 Full-length structure. Inhibits the interferon-mediated antiviral response. The NS1 protein of IAV serves a critical role in suppressing the production of host mRNAs by inhibiting the 30-end processing of host pre-mRNAs and consequently blocking the production of host mRNAs, including interferon-b mRNAs. Also involved in the import of the viral RNPs, tends to help hijack the import mechanism using importin alpha. Important both for the import and export of viral RNPs and mRNA copies to and from the nucleus to the cytosol Function unknown but may be an important protein factor for adaptation to new hosts PB1-F2 Recently, a novel 87-amino acid influenza A virus protein with proapoptotic properties, PB1-F2, has been reported that originates from an alternative reading frame in the PB1 polymerase gene and is encoded in most known human influenza A virus isolates HA IAV (H3N2) influenza viruses have circulated in humans since 1968, and antigenic drift of the hemagglutinin (HA) protein continues to be a driving force that allows the virus to escape the human immune response. Since the major antigenic sites of the HA overlap into the receptor binding site (RBS) of the molecule, the virus constantly struggles to effectively adapt to host immune responses, without compromising its functionality. M1 Full-length structure. Involved in regulating the import and export of the viral RNP. A key regulator for viral assembly, preferentially binding viral RNPs during viral assembly M2 Alternative splice produce. Combines in the form of a tetramer in the viral envelope where it regulates the flow of protons into the viral genome after the capsid has entered the cell and before release of the viral RNPs (endocytosis) Polymerase Basic 2: PB2 is the transcriptional product of segment 1 (gene PB2) which plays role in transcription initiation and cap- snatching, which is recognized and binds cellular capped (7-methylguanosine) pre-mRNA which are used as primer for viral transcription. 2.4Virus Particle and receptor interactionVirusesâ?? life cycle relays on host machinery to initiate the viral infection cycle. They need to introduce their genome in the host cell. In the case of influenza A virus the beginning of the infection initiates with the binding of the virus particle to the host cell receptor through HA (HA1 domain specifically) which is mediated by catherin dependent pathway or clathrin-independent pathway PP which mediates endocytosis of the virion through which pH gradually decreases in the endosomal environment. The acidity (particularly at the late endosomal stage) causes conformational changes of HA to reveal its stem region (HA2 domain) that helps in merging of the viral envelope and the endosomal membrane. Fig. Hemagglutinin binding to receptor (sialic acid on epithelial cells of the mucosal layer of the upper respiratory tract in humans via α-2,6 glycosidic linkage and α-2,3 glycosidic linkage in case of birds in the intestinal tractand both in case of pigs) showing the fusion of viral lipid membrane and host cell membrane. Conformational changes in the hemagglutinin due to lowering of pH inside the early endosome due to transfer of hydrogen ion via ion channel created by matrix protein 2. (http://pdb101.rcsb.org/motm/76) After the attachment of the antigen with specific receptor the virulence cycle starts with the transfer of negative viral ribonucloprtein (vRNP) to the cytoplasm of the host cell and to the nucleus of the host cell. Unlike, other viruses influenza virus replicates inside the nucleous. The replication and translation of the vRNA result in cRNA and mRNA which act as template for formation of vRNA and Proteins respectively using host machinery. Viral Ribonucleoprotein (vRNP) The combination of all the proteins interacting with the genomic strands of the influenza virus that are nucleoprotein, polymerase basic 2, polymerase basic 1, polymerase acidic along with RNA strands form the Ribonucleoprotein Particle (or vRNP). RNP along with capsid protein i.e. matrix protein 1 forms the nucleocapsid or nucleocapsid protein. The combinatorial function of polymerase proteins is as transcriptase. Fig. 2.6 showing the vRibonucleoprotein and its components (Nucleoprotein, Polymerase Basic2, Polymerase Basic 1 and Polymerase Acid). Source:http://cronodon.com/BioTech/Virus_Tech_3.html 2.5 NP as targetThe crystal structure of mono- nucleoprotein is a crescent protein shape showing head, body and tail region. It is a homotrimeric protein where the inner groove region is where the genome strands pass through. NP tail region interacts with genome replication and RNP assembly. Each NP bead is intercalated with one other through tail and grove region. If the nucleoprotein is phosphorylated they canâ??t interact with one another i.e. self-association is prohibited (Mondal et al., 2015). The genome segment 5 i.e. NP plays a key role in interspecies transmission of IAV, particularly in the switching of the virus from avian to mammalian hosts (Joseph et al., 2016). The viral NP is associated with the sensitivity of the IAV to host myxovirus resistance A (MxA) proteinsâ??an important intrinsic antiviral factor of mammals known to inhibit IAV as well as other RNA and DNA viral infections (https://www.uniprot.org/uniprot/P03466). NP protein restricts the MxA sensitivity of the host immune system to eliminate species barrier and enhance viral replication 10-fold, (Joseph et al., 2016) indicating IAV unique property to overcome the species barrier from an avian host to replicate successfully in the mammalian system through variation in the NP protein thus making is a fascinating target protein. Groove from where RNA segment passes Fig.2.7 Nucleoprotein homotrimeric structure showing groove where the influenza genome each segment pass through. The inner groove region passes occupies residues interacting with RNA. Each monomer of NP protein appears like garlands (RNA segment) of the bead (NP protein). Source: ScienceSourceImage (C0355970-WSN_A_influenza_nucleoprotein_complex) NP is a multifunctional protein of influenza A virus. The function of the nucleoprotein is not limited to transcription and replication of the influenza A virus but also extends to many roles in the viral life cycle. As a structural protein with no intrinsic enzymatic activity, it is the most abundant viral protein in the infected cells. NP is a critical component of the vRNP complex, and the recognized functions of NP include, but are not limited to, an organization of RNA packing, nuclear trafficking and vRNA transcription and replication. Monomeric NP molecular weight is a 56 kDa (Hu et al., 2017) protein with a patch of a basic residue capable of binding to a single-stranded RNA. NP bead is capable of binding to 24 bases of RNA. NP beads assemble into NP oligomers by a flexible tail loop, which can insert into the neighbouring NP monomer. NP also directly interacts with the PB1 and PB2 subunits of the viral polymerase. From the cryogenic electron microscopy the study of recombinant vRNPs, the minimal functional unit of RNP comprises one viral polymerase and nine NP monomers arranged in a rod-shaped structure. The cellular protein that mediates import of vRNPs includes importin-α, which was a show to bind to NP through the NLS sequence. In the mature virion, vRNPs are attached to the inner layer of the viral membrane through interaction with matrix protein M1 molecules. Aside from providing structural support for viral RNA and polymerase, NP also contains nuclear localization signals (NLS) and nuclear export signals (NES). Both sequences are critical in mediating the trafficking of the vRNPs in and out of the nucleus through active transport during the viral replication cycle. Due to the huge size of vRNPs (100â??200Ã?) the diffusion through the 9 Ã? nuclear pores is not possible directly. The cellular protein that mediates import of vRNPs includes importin-α, which binds to NP through the NLS sequence. In the mature virion, vRNPs are attached to the inner layer of the viral membrane through interaction with matrix protein M1 molecules. Dissociation of vRNP from M1 is a pre-requisite for the vRNP translocation into the nucleus as it exposes the NLS. In the late stage of viral replication, vRNPs are transported back to the cytoplasm to be incorporated into the progeny virion, and this process is mediated via the interaction of a protein assembly comprising cellular chromosome region maintenance 1 receptor (CRM1), M1, NEP, and vRNP. Additionally, NP is also critical for the viral genome transcription and replication. The influenza virus produces three types of viral RNAs: the messenger RNA (mRNA), the viral genome RNA (vRNA), and the complementary positive-sense RNA (cRNA). The cRNA is an intermediate for the replication of vRNA. The production of all three RNA species requires a functional NP (Mondal et al., 2015). NP protein appears promising target as it has no host cellular counterpart, thus high selectivity index could be achieved against NP inhibitor. Moreover, NP is highly conserved among IAV from different species, implying lesser chances of resistance to NP inhibitor. Tail loop region of the NP protein is proposed as a drug- able region (Abbas et al., 2013). Thus, NP inhibitors appear to be one of the most promising classes because of their potent antiviral efficacy and broad-spectrum antiviral activity. Overall, NP represents a high-profile drug target with multiple drug-able drug-binding sites (Mondal et al., 2015). Currently, influenza vaccine is the most prominent way to fight the disease for which WHO suggests the current the strains of vaccine virus-like candidate. Every year WHO prescribes two sets of vaccines for northern hemisphere (in the month of PP )and for the southern hemisphere (in the month of PP) which are to analyzed for the six months for the current circulating stains and to meet the complete population with is not easy and is the prediction is not correct or proper then chances of failure of vaccine is high like case of 2017 mainly two types of vaccine i.e. trivalent and quadrivalent were suggested but the efficacy as reported by WHO was not appropriate. In 2017 the efficacy of the vaccine against H1N1 was around 40% while that for H3N2 was only very low. Apart from vaccine other FDA approved prophylactics available are of two classes that are neuraminidase based and ion channel based. Two other classes of drugs are shown in the table that targets early phase of initiation of infection (M2 channel inhibitors) and the late phase when the release of the new progeny virusesâ?? occur (Neuraminidase inhibitor). Table 2.3 showing FDA approved antiviral drugs of the two classes Drug Class Drug Class Drug Name Structure FDA approval Reported Mutation Neuraminidase based Oseltamivir (Tamiflu) Approved E276D (H3N2), H274Y, R222Q Peramivir (Rapivab) Approved H274Y, R 222Q Zanamivir (Relenza) Approved (poor bioavailab ility) E276D (H3N2) Adamantine based Amantadine Not recommended L26F, V27A, S31N Rimantadine Other class of drug targeting Influenza virus Nucleoprotein which inhibit the formation of vRNPs by the mechanism of NP oligomerization. Naproxen was the first in-silico discovered against H1N1 and H3N2 strains its targets the RNA binding groove of NP and prevent its interaction with RNA (Shen et al., 2015), Nucleozin was identified in 2010 it shows both early and late stage inhibitory effect on influenza virus life cycle and Ingavirin is a Russian licensed drug against influenza virus (Paules et al., 2017) for which mode of action is not clear. The list of inhibitory compounds against NP have been enlisted in Table 2.4which comprises standard drugs compounds and ligands from the previously stated published literature work and were explored through PDB database (Ligand explorer), Drug Bank, PubChem and Binding DB. Table 2.4 Compounds against Nucleoprotein from literature and few compounds used as drug S.no. Compound Name Structure Molecular formula Reference 1 Nucleozin (CID 2863945) C21H19ClN4O4Ahmad et al., 2016, Shen Z. et al 2015, Gasparini et al ., 2014 2 (+)-(S)-2-(6- methoxynaphthalen-2-yl)-propanoic acid Naproxen (CID 156391) C14H14O3Ahmad et al., 2016, Shen Z. et al 2015, Gasparini et al ., 2014 3 1H1,2,3-triazole-4-carboxamide CBX (CID 21307) C12H13NO2SAhmad et al., 2016 4 4-(2-chloro-4-nitrophenyl)piperazin-1-yl3-(2-chloropyridi n-3-yl)-5-methyl-1,2-oxazol-4-ylmethanone OMM C20 H17 Cl2 N5 O4 Ahmad et al., 2016 5 4-(5- romanil-3-methyl-pyridin-2-yl)piperazin-1-yl-3-(2-chlorophenyl)-5-methyl-1,2-oxazol-4-ylmethanone OMS C21 H20 Br Cl N4 O2 Ahmad et al., 2016 6 4- (2-chloro-4-nitrophenyl)piperazin-1-yl3-(2-methoxypheny l)-5-methyl-1,2-oxazol-4-ylmethanone LGH C22 H21 Cl N4 O5 Ahmad et al., 2016 7 N-4-chlo ranyl-5-4-3-(2-methoxyphenyl)-5-methyl-1,2-oxazol-4-yl carbonylpiperazin-1-yl-2-nitro-phenylfuran-2-carboxami de BMS-885986 C27 H24 Cl N5 O7 Ahmad et al., 2016 8 N-4-chloranyl-5-4-3-(2-methoxyphenyl)-5-methyl-1,2-oxazol-4-ylcarbonylpiperazin-1-yl-2-nitro-phenylpyridine-2-carboxamide BMS-885838 C28 H25 Cl N6 O6 Ahmad et al., 2016 9 N- 4-chloranyl-5-4-3-(2-methoxyphenyl)-5-methyl-1,2-oxaz ol-4-ylcarbonylpiperazin-1-yl-2-nitro-phenylthiophene-2 â??carboxamide BMS-883559 C27 H24 Cl N5 O6 S Ahmad et al., 2016 10 Ingavirin (9942657) 5-2-(1H-imidazol-5-yl)ethylamino-5-oxopentanoic acid C10H15N3O3 Shen Z. et al 2015, Gasparini et al ., 2014 Currently, influenza vaccine is the most prominent way to fight the disease for which WHO suggests the current the strains of vaccine virus likecandidate (VCLCs). Every year WHO prescribes two sets of vaccines for northern hemisphere (in the month of PP )and for the southern hemisphere (in the month of PP) which are to analyzed for the six months for the current circulating stains and to meet the complete population with is not easy and is the prediction are not correct or proper then chances of failure of vaccine is high like case of 2017 mainly two types of vaccine i.e. trivalent and quadrivalent were suggested but the efficacy as reported by WHO was not appropriate. In 2017 efficacy of vaccine against H1N1 was around 40% while that for H3N2 was only 20%. Apart from vaccine other FDA approved prophylactics available are of two classes that are neuraminidase based and ion channel based. Two other classes of drugs are: Neuraminidase based class of drugs subjected to patients who are infected by the influenza virus but the new strains of the influenza virus develop resistivity due to the mutation in of the neuraminidase protein. Adamantine based drug are now not recommended due to the resistivity of influenza virus against them. Thus, there is no stable cure of the viral infection. 2.6 Challenges in Influenza VaccinesAlthough the currently licensed influenza vaccines (IIV and LAIV) are effective in healthy young adults (Houser et al., 2017). They include the dependence on embryonated eggs for vaccine production which possesses allergic sensitivity, the lengthy timeline for vaccine production, the need for annual vaccination, the emergence of antigenically novel viruses, the need for improved immunogenicity in the elderly, and the need for an improved correlate of protection. Several approaches have been developed to overcome these challenges and improve the immunogenicity and efficacy of influenza vaccines. Every year WHO recommends vaccines strain type for both the two hemispheres before seasonal flu infection but still changes of failure of the recommended vaccine is very high if the mismatch of virus vaccine with the currently circulating virus occurs. 2.7 Causes of Mutation: Antigenic Shift and Antigenic Drift2.7.1 Antigenic DriftIt is a continuous process and changes occur in surface protein-encoding genes (HA and NA) gradually and slowly. It takes place due to point mutation and is an unpredictable phenomenon. Mutations include deletions, substitutions and insertion mechanisms. The reason for point mutation is due to lack of proofreading capacity of viral polymerase (RdRp). This causes not much of changes but a minor change in surface proteins due to which previously available prophylactics (either Flu- shot or antiviral drugs) become less effective. Antigenic drift causes a formation of unrecognized antigens. These antigens may not be determined by previously produced antibodies of earlier influenza strains. This is one of the main reasons why people can suffer from the flu infection more than one time. Fig.2.8 Illustrating Antigenic Drift and Antigenic Shift; Source: WHO Collaborating Centre for References and Research on Influenza 2.7.2 Antigenic ShiftThis is non-continuous, occasionally occurring process. This is due to the exchange of the gene from one virus to another which can cause virulence difference. Since influenza A has a segmented genome so the segments from all the parent viruses are present. It is also referred to as â??reassortmentâ?? where Influenza A virus of different subtype from different host origin coincides into the same host generally swine are reported as a mixing – vessel. In the pigs both kinds of α2, 6 (human origin) and α2, 3 (avian origin) bonds are possible, thus re-assortment of the interspecies virus can occur, resulting in new and more pathogenic virus species for which no previous therapeutics are effective. The chances for antigenic shift occurs in 10 to 30 years as reported in earlier publications. Such a â??shiftâ? occurred in the spring of 2009, when a new H1N1 virus with a new combination of genes emerged to infect people and quickly spread, causing a pandemic. Antigenic shift can cause the more risky situation. 2.8 in-silico approaches2.8.1 PhylogenyPhylogenetic trees are commonly used as a visualization tool to help reveal the relationships among homologous sequences. When the number of sequences is limited, the relationships can be clearly observed from the tree; however, when more than a few thousand sequences are to be included, not only the accuracy of the inferred phylogenetic trees decreases, but it also becomes increasingly difficult to study the resulting trees and find patterns, and the computational demands of building a huge phylogenetic tree tend to be staggering. Researchers usually build a tree by sampling a small amount of data rather than constructing a complete tree using the entire dataset. However, the sampling is generally done according to the experience of the researcher and is sometimes arbitrary. The conclusions drawn from such trees may be biased. 2.8.2 The Lipinski rule of fiveThe rule is important for drug development where a pharmacologically active. Drug discovery hit to lead structure is optimized step wise for increased activity and selectivity, as well as drug like properties describes by Lipinskiâ??s rule. The modification of the molecular structure often leads to drug with higher molecular weight, more rings, more rotatable chemical bond and a higher lipophilicity. Lipinskiâ??s rule of five states that, in general, an orally active drug has no more than one violation of the following criteria: Not more than 5 hydrogen bond donors (nitrogen or oxygen atoms with one or more hydrogen atoms) Not more than 10 hydrogen bond acceptors (nitrogen or oxygen) A molecular weight under 500 Daltons A partition coefficient log â??Pâ? less than 5 Over past decades Lipinskiâ??s profiling tool for drug likeness has led to further investigation by scientists to extend profiling tools to lead like properties of compounds in the hope that a better starting point in early discovery can save time and cost. Note that all the numbers are multiple of five, which is the origin of the ruleâ??s name. Improvements in Lipinskiâ??s rule of five Molar refractivity from 40 to 130 Molecular weight from 160 to 480 Number of heavy atoms from 20 to 70 2.8.3 Docking theoryThe original concept of docking comes from the concept of â??lock and keyâ?? of rational drug design, but the precise algorithms used to fit the â??keyâ?? (the ligand) into the â??lockâ?? (the receptor protein) vary across programs (Chen Yu 2014). The latest development in the docking program can be seen in the docking web server, screen software, and screen webserver which shows the number of new algorithms in recent years. The most commonly used docking programs are Autodock and GOLD but there are more new programs which are more accurate. They are merely more popular and well known and with their high citation rate is due to that they are freely available and were created earlier than other programs (Chen Yu 2014). To find the lead molecule or a ligand which interacts best with selected protein or receptor as target which can be easily achieved by computational prediction with no wasting of hefty amount of money and time on tedious laboratory work. This computational process is called as molecular docking. Molecular docking (MD) is a computational tool used for prediction of possible ligand receptor interactions. The programs helps to evaluate all feasible binding pockets of a lead candidate with its target macromolecule. Where a receptor can be a protein with known 3D structure, nucleic acid etc. the individual binding pockets predicted by a ligand/lead combination are known as binding poses. The poses include both the position of the ligand relative to the receptor and the conformational state of the ligand. A docking procedure involves following tasks: Characterization of the binding site Orientation of the ligand into the binding site Evaluation of the strength of interaction for a specific ligand-receptor pose (â??scoringâ??) Sampling algorithm It is the search space consisting (six degree of translational and rotational freedom along with conformational degree of freedom) of all the possible orientations or conformations between two molecules (Meng et al 2011). For conformation generation various sampling algorithms have been developed. Autodock uses Genetic algorithms (GA), the key concept comes from Darwinâ??s theory of evolution. Where degree of freedom of the ligand is considered as gene and these gene make up chromosome, which represent the pose of the ligand. It is a stochastic method. Scoring functions It is used to estimate the correct pose among +many incorrect poses. It doesnâ??t calculate the binding affinity between two molecules. Force- field based scoring considers hydrogen bonds, solvations and entropy contributions. Types of docking Rigid Lock and Key Its represents lock and key model where the protein/ receptor is the â??lockâ? and the ligand/ lead is the â??keyâ?. Here the internal geometry of the ligand and the protein are kept stationary during docking which results if find the correct orientation of the â??keyâ? which will open the â??lockâ?. Flexible or Induced Fit In flexible docking, the ligand is kept flexible and the energy needed for different conformations of the ligand fitting into the protein are calculated. Though this method is time consuming but it evaluates different possible conformations making it more reliable. During this process ligand and protein adjust their conformation to achieve overall â??best-fitâ?. The conformational adjustments result in an overall binding pattern called â??induced- fitâ?. 2.9Research PerspectiveDue to fast rate of mutation in the influenza Avirus and lack of proof reading property of the viral polymerase (RdRp), the influenza A virus remains moving target, though every year vaccines against circulating viruses are updated but still currently there is no stable cure against the flu infection. The available drugs and vaccines show certain limitations like allergies, production cost, production duration etc. apart from that the current viruses are developing resistivity (Shen et al., 2015) against Food and Drug Administration (FDA) approved antiviral drugs (M2 ion channel based and neuraminidase based) that are oseltamivir and adamantine drugs which target the membrane proteins of the virus where the binding site is mutating (Ahmad et al., 2016). Thus, there is a need for analyzing more potent target which can target broader spectrum of the subtypes. Based on multiple sequence alignment of the major proteins of the influenza A virus is to be analyzed to find the conserved protein over the time duration. Further, the lead like compound identification is done using natural compound library. CHAPTER 3Materials and Methodology3.1 ResourcesNCBI PubMed NCBI Influenza Resource Database (Genpept) Uniprot KB BLAST Protein Databank Drug Bank PubChem/Ligand explorer/BindingDB Software MEGA 7.0.26 (Molecular Evolutionary Genetic Analysis) Autodock 4.2.5 Python3.6.4 MGL tools v1.5.6 Discovery studio 2017R2 Modeller 9.19 Open Babble v2.4.1 Chimera v1.12 SuiteMSA1.3.22B Other online server tools Clustal omega/ TCoffee Dogsite scorer (https://proteins.plus/) Lipinski filter (http://www.scfbio-iitd.res.in/software/drugdesign/lipinski.jsp#anchortag) Pre ADMET (https://preadmet.bmdrc.kr/description-of-preadmet/) SwissADME Protein validation server tools Rampage/ Procheck SAVES server (ERRAT) ProSA System configuration Window 10 home Single Processor: Intel (R) Core (TM) i5- 5200U CPU @ 2.20GHz 2.20GHz RAM: 4.00 GB System type: 64- bit Operating system, x64- based processor 3.2 Methodology3.2.1 Target IdentificationInfluenza virus is categorised as moving target and even literature review was done to find the potent drug-able target against influenza A virus but the reviews did highlighted the most relevant target. Most of the works reported membrane proteins (Neuraminidase) as target which is easily targetable but variant resistance against the current protein targets are reposted. Thus, in this work phylogenetic analysis and functionally important protein is selected as the target protein. For the phylogenetic analysis the sequences of the major proteins of the Influenza A virus were collected from NCBI Influenza Resource Database (https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi#mainform) of different subtypes mainly focussing on complete sequences from 2016-2018. The sequences were collected based on WHO 2017 summarised subtypes of major concern (http://www.who.int/influenza/vaccines/virus/201709_zoonotic_vaccinevirusupdate.pdf). The sequences of following subtypes namely (H1N1, N3N2, H5N8, H7N2 and H7N9) were sort out as are considered as major reason of concern. The accession number for the respective subtype and their proteins are enlisted in the Table 3.1 below: Table 3.1: List of sequences with respective subtype and protein with accession IDs Subtypes Polymerase Basic 1-F2 protein Hemagglutinin Polymerase PB2 Matrix protein 1 Matrix protein 2 Nonstructural protein 1 Nuclear export protein Neuraminidase nucleocapsid protein PA-X protein Polymerase PA Polymerase PB1 H1N1 ARG42338.1 APW84952.1 ARQ85977.1 ARQ85089.1 ARQ86866.1 ARQ86202.1 ARQ86203.1 APW84953.1 ARQ85929.1 AQS98002.1 AQS98001.1 ARQ86196.1 H3N2 ARQ85075.1 APU51971.1 AQS98278.1 ARQ85819.1 AQS97925.1 AQS96840.1 AQS96841.1 APW84947.1 ARQ85817.1 ARQ86077.1 AQS98418.1 AQS97389.1 H5N8 AQW44209.1 AQW44212.1 ARU07101.1 AQW44215.1 ART29470.1 ARX97491.1 ART29472.1 AQW44214.1 AQW44213.1 AQW44211.1 AQW44210.1 AQW44208.1 H7N2 ARG42777.1 ARG42780.1 ARG42775.1 ARG42783.1 ARG42784.1 ARG42785.1 ARG42786.1 ARG42782.1 ARG42781.1 ARG42779.1 ARG42778.1 ARG42776.1 H7N9 ARG43216.1 AQW43714.1 AUS83944.1 AQW43717.1 ASF57849.1 ASF57850.1 ASF57869.1 AQW43716.1 AQW43715.1 AQW43713.1 AQW43712.1 AUS83945.1 The abbreviated forms PB2, PB1, PA, HA, NP, NA, MP1, MP2 and NS2 here are used for protein names. The following dataset was prepared highlighting the subtype that were used for further analysis in Clustal Omega in FASTA format. Other datasets which were excluded from the work due to no proper analysis and conclusion could be made. But each were conducted with separate MSA and phylogeny analysis to find most stable protein over time duration. The below snapshot shows the datasets prepared for each protein included in NCBI Influenza Resource Database. Auxiliary proteins were excluded from further analysis (PA-X and PB1-F2) and only full length sequences with complete residues chains were used in this work. Fig. 3.1 representing datasets of each protein obtained from NCIB Flu Database and respective analysis made were included in same folders accordingly. 3.2.2 Multiple Sequence AlignmentThere are number of tools available to perform Multiple Sequence Alignment (MSA) like Clustal, Muscle, TCoffee in built MSA program in the MEGA software etc. but with advantage and disadvantages of each were useful in the work. Multiple Sequence Alignment (MSA) was done using Clustal Omega which is an online server tool, services provided by EMBL-EBI (European Bioinformatics Institute) (https://www.ebi.ac.uk/) at (https://www.ebi.ac.uk/Tools/msa/clustalo/). Clustalâ?¦ is the most updated version of the clustal series computer programs. It uses seeded guide trees and HMM (Hidden Markov Model) profile- profile techniques to generate alignments between three or more sequences. The maximum capacity of this tool is 4000 sequences or file size of 4MB. Essentially, Clustal creates Multiple Sequence Alignments through three main steps: Do a pairwise alignment using the progressive alignment method Create a guide tree (or use a user-defined tree) Use the guide tree to carry out a multiple alignment TCoffee server was also used to generate .aln format which was used is SuiteMSA software to generated conserved percentage. The output MSA of TCoffee is very basic and user friendly. SuiteMSA- 1.3.22B SuiteMSA is a java-based application that provides unique MSA viewers and can directly compare multiple MSAs and evaluate where the MSAs agree (are consistent) or disagree (are inconsistent).It is a visual tool for the comparison and analysis of Multiple Sequence Alignment (Anderson et al., 2011) helps in calculation of percentage gaps and percentage conserved region of the aligned sequences supporting FASTA format of the same length sequences else aligned file in the .aln format is supported which can be obtained using TCoffee software. SuiteMSA works on JAVA platform using Muscle or Clustal alignment methods for alignment and also shows statistical results. Below is the image showing software GUI and the output generated. Fig. 3.2 showing window representation of SuiteMSA and MSA output viewer 3.2.3 PhylogenyMolecular Evolutionary Genetics Analysis (MEGA) software provides a user friendly tools for conducting statistical analysis DNA and protein sequences from evolutionary perspective by constructing phylogenetic trees. The project was developed at the Pennsylvania State University under leadership of Masatoshi Nei and co-workers Sudhir Kumar and Koichiro Tamura. The software is aware at Mega webpage (https://www.megasoftware.net/). The major features provided by MEGA 7.0.26 (January 2016) are sequence alignment construction (Integrated web browser, sequence fetching, Alignment editor, Multiple Sequence Alignment), Data handling (Extended MEGA format to save all data attributes, importing data from other formats: Clustal, Nexus, etc., Data explorers), genetic Code table section, sequence data viewer (Highlighting, data export, statistical quantities estimation), distance estimation method, tree making (Neighbor joining, Minimum evolution method, UPGMA, Maximum parsimony, Maximum likelihood, Bootstrap phylogeny test, Distance matrix viewer) (Kumar et al ., 2003). In this work the set of orthologous amino acid sequences (the protein sequences with different function but from same organism) were taken of Influenza A virus of the year 2016-2018. The proteins were namely HA, NA, M2, M1, PA, PA-X, PB1, PB2, PB2-F, NP, NS1 and NEP which were selected for phylogenetic study. Set of 5 subtypes (H1N1, H3N2, H5N8, H7N2 AND H7N9) were selected of different hosts like Human, Avian types, cat, horse, swine etc. from NCBI Flu database for which the FASTA format sequences were collected of all protein type and single file of 50 sequences was prepared called as CLUSTER.FASTA which contain all the protein types. This CLUSTER.FASTA file was used for phylogenetic analysis through MEGA 7.0.26 software which is freely available but user friendly software. The software supports multiple sequence alignment by two ways i.e., by ClustalW and Muscle. ClustalW performs global alignment while muscle performs combination of global and pairwise alignment thus, muscle program is preferable for protein sequence. It uses progressive alignment (Edgar R. 2004). Then, the aligned sequences were subjected to phylogenetic tree analysis for which character based method was used i.e. Maximum Likelihood method was used which is a character based method and is preferred when similarity changes are very low among the set of sequences (Hall et al., 2013). Maximum likelihood (ML) turns the phylogenetic problem inside out. ML searches for the evolutionary model, including the tree itself that has highest likelihood of producing the observed data. ML is derived for each base position in an alignment. The likelihood is calculated in terms of probability that the patterned of variation at a site. The resampling of the tree was done using the bootstrap value of 500 which is the evaluation method. The results of bootstrap analysis is typically a number associated with a particular branch in the phylogenetic tree that gives the proportion of bootstrap replicates that support the monophyly of the clade. The ancestral distance above 90 is considered better. Thus, the target was selected in this manner which gave the evolutionary consistency of two proteins namely Nucleoprotein and Matrix protein 1. Out of the two proteins functionally more important protein was used as target i.e. Nucleoprotein was selected. Fig.3.3 Representing MEGA program running 3.2.4Target Selection and Homology Modelling of NUCLEOPROTEIN OF Influenza A Virus H3N2 strain as receptorThe protein structure database (PDB, UniprotKB) was exploited for 3D structure of H3N2 nucleoprotein of influenza A virus which showed no crystal structure of the respective protein and the sequence. Thus, work was directed to homology modelling which is comparative method for prediction of 3D model. The modelling was performed by Modeller 9.19 software of the receptor protein (Webb et al., 2016). 3.2.4.1 Steps in comparative modellingA homology modelling pipeline generally comprises the following steps which can be repeated until a suitable model is obtained: Template selection for identifying the most suitable experimentally determined structures; Targetâ??template sequence alignment; 3D model structure building; Model refinement; and Model quality estimation. Model refinement usually involves clash removal and geometrical regularization of bond lengths and angles but can also involve additional more-sophisticated structural amendments. As a rule of thumb, most attention should be devoted to steps i, ii, iii and v, whereas global model refinement (iv) typically has a disappointing return on investment (Schmidt et al., 2014). 3.2.4.2 Input files needed for ModelerThe input file of the modeler used PIR format not the FASTA format. And the target sequence file name was saved as â??qseq1.aliâ?? >P1;qseq1 sequence:qseq1:::::::0.00: 0.00 MASQGTKRSYEQMETDGDRQNATEIRASVGKMIDGIGRFYIQMCTELKLSDHEGRLIQNSLTIEKMVLSAFDERRNKYLEEHPSAGKDPKKTGGPIYRRVDGKWMRELVLYDKEEIRRIWRQANNGEDATSGLTHIMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGIGTMVMELIRMIKRGINDRNFWRGENGRKTRSAYERMCNILKGKFQTAAQRAMVDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACAYGPAVSSGYDFEKEGYSLVGIDPFKLLQNSQIYSLIRPNENPAHKSQLVWMACHSAAFEDLRLLSFIRGTKVSPRGKLSTRGVQIASNENMDNMGSSTLELRSGYWAIRTRSGGNTNQQRASAGQTSVQPTFSVQRNLPFEKSTIMAAFTGNTEGRTSDMRAEVIRMMEGAKPEEVSFRGRGVFELSDEKAANPIVPSFDMSNEGSYFFGDNAEEYDN * Selected templates are to be saved in â??.pdb formatâ?? and can named accordingly as tseq1, tseq2 and so on. Apart from above requirements modeler also needs five script files in phylip format denoted as â??.pyâ?. Script1.py aligns the query sequence with available templates producing output file as log.txt and qseq1-tseq1.ali file. Script2.py file shows which of the template in more closely related to query sequence. Script.3 dose the alignment of the selected template with qseq1. Script4.py generates models of the query sequence based on number of models mentioned in the commands. The log.txt file specifying the dope score and GA341 scores of the each model and other factors of the models generated. Lastly, script5.py is used for evaluation of the models based on dope score and GA341 scores. All the processing was done by running Modeller as administrator and giving the following commands for the respective directory folder where the input files are saved. C:Program FilesModeller9.19>cd C:Program FilesModeller9.19in
ew np C:Program FilesModeller9.19in
ew np>dir C:Program FilesModeller9.19in
ew np>mod9.19 script1.py C:Program FilesModeller9.19in
ew np>mod9.19 script2.py C:Program FilesModeller9.19in
ew np>mod9.19 script3.py C:Program FilesModeller9.19in
ew np>mod9.19 script4.py C:Program FilesModeller9.19in
ew np>mod9.19 script5.py Fig. 3.4 Representing Modeller 9.19 running program and the commands used. 3.2.4.3 Validation of Models generatedThe generated models were validated using Rampage Ramachandran plot server, ProSA, ERRAT severs which verified the models based on Z-score and residues stability quality factors respectively. Further the best model was superimposed to the template selected using Chimera v1.12 which helped in generating the sequence based and template identity percentage and highlighted the superimposed and non-superimposed regions. 3.2.5 Active Site PredictionAfter the assessment of the generated models, active site residues of final model were identified though DogSiteScorer and CASTp server tools (Suganya et al., 2016). There are bulk of server tools available for the identification of pocket regions of the protein like MetaPokect, Q-site Finder, CASTp and DogSiteScorer, Ligsite Finder etc. DogSiteScorer (http://dogsite.zbh.unihamburg.de/) is an online server tool for prediction of binding site. It depicts all the pockets of macromolecule but the one with greater volume and drug score is considered favorable pocket or binding site. It also depicts the active site not only on the surface of the protein but also within the molecule. CASTp (http://sts-fw.bioengr.uic.edu/castep/calculation.php) is also an online server tool calculated active site volume and residues are mentioned. 3.2.6 Ligand Library PreparationThe ligands selected were through literature which were natural compounds from plant sources. The enlisted compounds were used against different protein targets of influenza Avirus like Neuraminidase and Hemagglutinin protein. Mainly, works are done on membrane proteins and a very few published works are there against other targetable proteins with natural compounds which will have nil or very low toxic effect for the patients. The library preparation was done using Pubchem, supernatural databases, Chemble for natural compounds with 3D structures considering the molecular formula, molecular weight, IUPAC name and CID. 3.2.7 Literature based natural compound collections.n. Compound Plant source Target References 1 Hesperidin (10621) NA Sharma et al., 2010 2 Narirutin (442431) NA Sharma et al., 2010 3 Proanthocyanidin NA Sharma et al., 2010 4 Ursolic (64945) NA Sharma et al., 2010 5 Tangeretrin (68077) NA Sharma et al., 2010 6 Abbyssinone NA Sharma et al., 2010 7 Nobiletin (72344) NA Sharma et al., 2010 8 Tannic_acid NA Sharma et al., 2010 9 Sitosterol NA Sharma et al., 2010 10 Ellagic acid (5281855) Pavlova et al., 2016 11 Vitamin E (20353) Pavlova et al., 2016 12 Vitamin C (Ascorbic acid) (54670067) Pavlova et al., 2016 13 Andrographolide (5318517) Andrographispaniculata Seniya et al., 2014 14 Arabinoxylan (6438923) Seniya et al., 2014 15 Epigallocatechin (65064) Seniya et al., 2014 16 Imperatorin (10212) Seniya et al., 2014 17 C20H30O5 (29927575) Seniya et al., 2014 18 C20H30O5 (6857767) Seniya et al., 2014 19 C20H30O5 (11382524) Seniya et al., 2014 20 Andrographis C20H30O5 (5318517) Seniya et al., 2014 21 Naringin (4441) Villa et al., 2016 22 Hesperetin (72281) Villa et al., 2016 23 Procyanidin B-2 3,3â?²-di-O-gallate Tea polyphenols IAV, IBV Yang et al., 2013 24 Procyanidin B2 (122738) Tea polyphenols IAV, IBV Yang et al., 2013 25 Theaflavin (114777) Tea polyphenols IAV, IBV Yang et al., 2013 26 Theaflavindigallate (44448535) Tea polyphenols IAV, IBV Yang et al., 2013 27 Kaempferol (5280863) Tea polyphenols IAV, IBV Yang et al., 2013 28 Quercetin (5280343) Tea polyphenols IAV, IBV Yang et al., 2013, Villa et al., 2016 29 Myricetin (5281672) Tea polyphenols IAV, IBV Yang et al., 2013 30 Dihydromyricetin 31 5,6,3â??, 4â??-tetra-O-methylquercetin 32 5,7-dihydroxy-4-oxo-2-(3,4,5- 33 trihydroxyphenyl)chroman-3yl-3,4,5-trihydroxycyclohexanecarboxylate 34 3- Caffeoylquinic acid (1794427) 35 Procyanidin B1 (11250133) 36 (+)-catechin (CID 1203) Tea polyphenols IAV, IBV Yang et al., 2013, Villa et al., 2016 37 (-)-epicatechin (CID 72276) Tea polyphenols, Murrayakoenigii IAV, IBV Yang et al., 2013, Suganya et al., 2016 38 (-)-epigallocatechin (EGC) (CID 72277) Tea polyphenols IAV, IBV Kannan et al., 2017, Yang et al., 2013 39 (-)-epicatechingallate (ECG ) (CID107905) Tea polyphenols IAV, IBV Kannan et al., 2017, Yang et al., 2013 40 (-)-epigallocatechin gallate (EGCG) (CID 65064) Tea polyphenols IAV, IBV Kannan et al., 2017, Yang et al., 2013, Villa T. et al., 2016 41 Cur/Curcumin (CI) Kannan et al., 2017, Vora et al., 2017 42 Curcumin (969516) Curcuma longa Vora et al., 2017 43 DMC/Demethoxycurcumin (CII) (5315472) Kannan S. et al., 2017 44 BDMC/Bisdemethoxycurcumin (CIII) (46946863) Kannan S. et al., 2017 45 Andrographolide (Andro) (5318517) Kannan S. et al., 2017 46 14-dehydroxyandrographolide-12-sulfonic Kannan S. et al., 2017 47 acid sodium salt (DASS) (101928943) Kannan S. et al., 2017 48 14-α-lipoylandrographolide (AL-1) (101476459) Kannan S. et al., 2017 49 Nimbaflavone (14492795) Azadirachtaindica Ahemad et al., 2016, Vora et al., 2017 50 Nimbidinin (101306757) Azadirachtaindica Vora et al., 2017 51 Nimbolide (86287562) Azadirachtaindica Vora et al., 2017 52 RUTIN(5280805) Azadirachtaindica, Murrayakoenigii, Solanum torvum NS1, NA Ahemad et al., 2016, suganya et al., 2016, Ahmad et al., 2015 53 Hyperoside (5281643) Azadirachtaindica NS1, NA Ahemad et al., 2016, Ahmad et al., 2015 54 7-O-galloyltricetiflavan (11669392) Lepidiummeyenii Mendoza et al., 2014 55 7,4′-di-O-galloyltricetiflavan (11999968) Lepidiummeyenii Mendoza et al., 2014 56 Quercetin-3-O-;Icirc;;sup2;-D-glucopyranosyl (10463057) Lepidiummeyenii Mendoza et al., 2014 57 Ribavirin (37542) Lepidiummeyenii Mendoza et al., 2014 58 quercitrin (5280459) NS1 Ahmad et al., 2015 59 Tiplasinin(6450819) NS1 Ahmad et al., 2015 60 Tetratriacontane(26519) NS1 Ahmad et al., 2015 61 ( )-Nimocinolide(6442906) NS1 Ahmad et al., 2015 62 127-40-2(46835684) NS1 Ahmad et al., 2015 63 6-O-acetylnimbandiol NS1 Ahmad et al., 2015 64 Allicin (65036) Allium sativum HA Chavan et al., 2018 65 Plumbagin (10205) Plumbagoindica HA Chavan et al., 2018 66 ;Acirc;;nbsp; Dacrydiumcupressinum(diterpenoids) HA Dang et al., 2015 67 Totarol (92783) Podocarpustotara HA Dang et al., 2015 68 Amidinomycin 160703 (Toxic) nofomicin Streptomyces species Korshin et al., 2013 69 Matteflavoside (A-G) Matteucciastruthiopteris(Flavonoid) NA Li et al., 2015 70 sesquiterpenoid Phellinus ignarius NA Song et al., 2013 71 Caffeic acid (689043) NA Xie et al., 2013 72 Nimbaflavone (44814409) Azadirachtaindica NP Ahmad et al., 2016 73 Rutin (49837869) Azadirachtaindica NP Ahmad et al., 2016 74 Hyperoside (53361971) Azadirachtaindica NP Ahmad et al., 2016 75 Gingerol (442793) Zingiberofficinale Suganyaet al., 2016 76 Solosonine (119247) Solanunigrum Suganyaet al., 2016 77 Piperine (2763851) Piper nigrum Suganyaet al., 2016 78 Cuminaldehyde (326) Cuminumcyminum Suganyaet al., 2016 79 Piperidine (8082) Piper nigrum Suganyaet al., 2016 80 Solonine (262500) Solanum nigrum Suganyaet al., 2016 81 Catechin (9064) Solanum torvum Suganyaet al., 2016 82 Caffeic acid (689043) Solanum torvum Suganyaet al., 2016 83 Gallic acid (370) Azadiractaindica Suganyaet al., 2016 84 Thymol (6989) Trachyspermumammi Suganyaet al., 2016 85 Zingiberine (92776) Zingiberofficinale Suganyaet al., 2016 86 Diosgenin (99474) Trigonellafoenum Suganyaet al., 2016 87 Ferulic acid (445858) Solanum torvum Suganyaet al., 2016 88 Gamma-terpene (7461) Cuminumcyminum Suganyaet al., 2016 89 Myrcetin (5281672) Murrayakoenigii Suganyaet al., 2016 90 Nimbolide (100017) Azadiractaindica Suganyaet al., 2016 91 Trigonelline (5570) Trigonellafoenum Suganyaet al., 2016 92 Mahanimbine (16793) Murrayakoenigii Suganyaet al., 2016 93 Luteolin (5280445) Cuminumcyminum Suganyaet al., 2016 94 Solosodine (5250) Solanum torvum Suganyaet al., 2016 95 Beta pinene (14896) Cuminumcyminum Suganyaet al., 2016 The 3D structures were downloaded in .sdf format were converted into .pdb format with the help of OpenBabel software (W J Geldenhuys, 2006). Babel is a cross platform program designed to convert chemical objects (currently molecules) from one file format to another. Fig. 3.5 Showing Open Babble Screen page where left side is input space for file format to be converted and right side gives the output in the desired chemical file format. 3.2.7 AutodockVinaAutodockVina is an open- source program for doing molecular docking. It was designed and implemented by Dr. Oleg Trott in the Molecular Graphics Lab at The Scripps Research Institute. File preparation for AutodockVina Protein preparation: the target protein used as input was in the .pdb format in which all atoms were made polar, hydrogen was added and kollmen charges were added to the receptor. Then file was in .pdbqt format which was used for grid formation. Grid Box formation: The grid was adjusted in the region of the active site and the x-axis, y-axis and z-axis and central axis were set and the secession was saved. Then file was saved in the .pdbqt format as supported by Autodock. Ligand preparation: The .sdf files of ligands were converted into the .pdb by using Open Babel software. Then torsion was set and the output files were saved in .pdbqt format. The desired file was but in the Vina folder i.e., ligand.pdbqt, receptor.pdbqt and conf.txt file which contains the dimensions of the grid box. Then docking was performed in the cmd. Searching algorithm gave all the possible poses and scoring function split the best pose giving the best binding energy and stable position of protein-ligand complex. The protein- ligand analysis was done by opening the vina result in Analysis Tab of the Autodock software as docking analysis. For searching program and scoring function the command line was used after assigning the directory the following commands were used: “C:Program Files (x86)The Scripps Research InstituteVinavina.exe” –config conf.txt –log log.txt “C:Program Files (x86)The Scripps Research InstituteVinavina_split.exe” –input ligand_out.pdbqt Fig. 3.6 The left image shows the active region of the target protein (Nucleoprotein) and the right side image shows the grid box generated for the macromolecule. ADME/T Property Prediction To verify the suitability of the selected known molecule as lead compound, ADME property prediction had been performed using the online preADME/T prediction (https://preadmet.bmdrc.kr/description-of-preadmet/) (Lee et al., 2003, 2004). The properties predicted were drug likeness, ADME prediction (Adsorption, Diffusion, Excretion and Metabolism) and the toxicity prediction were generated (Bennion et al., 2017). Number of properties can be predicted by this server like Caco2 permeability, Plasma Protein Barrier (PPB), mutagenicity and carcinogenicity prediction, Lipinski filtration etc. METHODOLY IN FLOWCHART FORMLiterature based target identification against Influenza A Virus Discussed new approaches for targets and limitation but no confirmed target Making datasets of all major proteins sequences in FASTA format of IAV MSA and phylogeny of orthologous set of proteins of different subtypes of IAV Performing MSA of selected proteins and selecting the target sequence Validation of selected target protein by sequence and structure alignment against BLASTp database of all the proteins Selecting the sequence for homology modelling (Modeller 9.19) and validation of the modelled protein (Rampage, Procheck, ProSA, ERRAT) Drug library preparation (Drug Bank and literature) Natural compound based ligand library preparation (using PubChem) Molecular docking studies using AutoDock 4.2 Lipinski filtering of the top hit compounds and ADME/T prediction of lead like compounds Protein- Ligand interaction analysis CHAPTER 44. Results and Discussion4.1 PhylogenyThe tree was developed using 5*10 sequences of 10 different protein including all proteins important for virus. Full length sequences of proteins from different host were used and partial sequences were excluded from the work. FASAT format sequences were downloaded from NCBI Flu Database. The tree construction was done using MEGA 7 software after performing the amino acid sequence alignment using inbuilt MUSCEL program (Hall et al., 2013) and Maximum- Likelihood Tree was constructed using default parameters and 500 bootstrap repetition was performed (Huang et al., 2013, Hall et al., 2013). The tree generated is an unrooted tree of the different proteins of IAV from different host revealed that though belonging to different host organisms (human, birds, feline, swine) but based on character all the proteins get clustered together with one another. The tree generated is clustered into two major clades (Clade 1 and Clade 2) showing that Clade 1 consisting of 7 proteins out of total proteins (NS1, NS2, PB2, PA, PB1, NA, NP) while Clade two comprises 3 proteins namely (HA, MP1, MP2). Clade 1 is further divided into two SubcladesClade (1a) and Clade (1b). Sub- Clade (1a) consist of 5 proteins (NS1, NS2, PB2, PA and PB1), the functions of all these proteins are interlinked while performing replication and transcription of the viral genome. The three proteins function as polymerase complex (PB2, PB1 and PA) and are transcriptional product of segments 1, 2 and 3 respectively. The PB1 as secondary node is further rooted based on functional relationship with other four proteins (PA, PB2, NS2 and NS1) based on distances from node PB1. While NS1 and NS2 are directly related and are transcriptional product of segment 8 from different frame shifts. The NS1 and NS2 are directly branched with PB2 are their functions are also inter- dependent with one another. (1a) (1b) Clade 1 Clade 2 NA NP HA MP1 MP2 PB2 NS2 PB1 PA1 NS1 Fig. 4.1 the phylogeny tree constructed using Maximum likelihood method and 500 bootstrap repetition was done. The abbreviated forms PB2, PB1, PA, HA, NP, NA, MP1, MP2 and NS2 here are used for protein names. In Clade 1, Sub- Clade (1b) is the most important region of the tree, where 1 protein is a membrane protein (NA) while other is intracellular protein (NP) and is interacting directly with the genome of the Influenza Virus. They are products of segments 6 and 5 respectively. NA is the protein recently used as target but resistivity is reported against NA inhibitors. NP is closely related to NA so it can be proposed as potentially good target protein. Clade 2 shows the gene products of the segments 4 (HA protein) and segment 7 (Matrix protein 1 and Matric protein 2)where MP2 and HA are membrane proteins and play role simultaneously during infection cycle. MP1 provides body to the virus. The above phylogenetic tree shows how the protein are functionally related with one another and each group shows that functionally similar protein get clustered with one another though belong to different subtypes like H1N1, H3N2, H5N8, H7N2 and H7N9 and also from different hosts like human, birds, feline, swine. Here the orthologues (genes or proteins evolved from common ancestor) set of sequences are used of influenza virus. Target selected is H3N2 strains of NP protein. Since, H3N2 strains in 2017 was unaffected by vaccines and antiviral resistances is already reported thus, H3N2 strain is selected as target protein. Validation of selected target was done by re-aligning the sequences separately and sequence and structural alignment was also done for each protein separately. 4.1.1 Multiple sequence Alignment analysisThe MSA analysis using SuiteMSA software of each protein revealed that HA, NA, MP2, PB1-F2, PA-X and NS1 do not have same amino acid sequence lengths. Whereas, PB2, PB1, PA, NP, MP1 and NS2 have the same sequence stretches over time duration and even different hosts. As shown in Table 4.1 Table 4.1 MSA analysis using SuiteMSA software showing conserved percentages of only fixed length sequenced proteins. Data Achieved PB2 PB1 PA NP MP1 NS2 Gap% 0.0 0.0 0.0 0.0 0.0 0.0 % conserved 83.7 83.0 79.9 74.9 74.6 62.0 % column un-gapped 100 100 100 100 100 100 Protein length 759 757 716 498 252 121 The abbreviated forms PB2, PB1, PA, NP, MP1, and NS2 here are used for protein names. PB2, PB1 and PA show the highper cent of the conserved region while NP, MP1 show almost similar % conserved regions and NS2 shows least. Since, the influenza genome is intercalated maximally by Nucleoprotein and the percentage of this protein is abundant as compare to other internal proteins. MSA comparison of NA protein sequences and NP protein sequences A comparative analysis of NP with NA and MP1 sequences was made using SuiteMSA software which revealed that NA being current drug target protein is less conserved among different host origin sequences and also with in the same subtype (IAV H3N2 is used in this work as target subtype). Table 4.2 showing MSA analysis of aligned sequences of NA with NP and MP1 proteins depicting that NP protein is more conserved as compared to NA within the same subgroup (H3N2) and also among different subtypes. Data Achieved Nucleoprotein Neuraminidase Matrix protein 1 Same subtype (H3N2) Different subtypes Same subtype (H3N2) Different subtypes Same subtype Different subtypes Gap% 0.0 0.0 0.1 3.9 0.0 0.0 % conserved 98.2% 85.9 91.3 24.8 96.4 83.3 % column un-gapped 100 100 98.9 89.5 100 100 Protein length 498 498 496 (different lengths) 484 (different lengths) 252 252 The MSA analysis of NA which is variable in its sequence length was aligned using TCoffee server to obtain .aln file of the aligned sequences for NA protein of same subtypes and for different subtypes revealed that NA is highly conserved i.e. 91.3% within H3N2(same subtype) while conservation percentage among different subtypes circulating among humans, birds and other hosts revealed only 24.8% conservation. In the case of NP, respective conservation percentages were 98.2% (among H3N2 subtype) and 85.9% (combination of different subtypes) which isgreater than NA and MP1 protein also. MP1 protein showed 96.4% and 83.3 % conservation which is also better than NA but not equal to NP protein. As reported for Hemagglutinin, it shows 15% diversity within same subtype for amino acid sequences while 40-60% diversity is shown between different subtypes (Sautto et al., 2018). Thus, conforming NP as suitable target protein. BLASTp Results for all IAV Proteins with Available Crystal Structure Thus, cross-validation of sequences with PDB available structures was done of which results are shown in Table 4.3 and a graphical representation is also shown in fig 4.2. All the other proteins of influenza A virus are not included in this table as the conserved percentage of the protein could not be calculated due to inconsistent sequence sizes. Table 4.3 structure and sequence alignment for all proteins Protein Names Query Coverage % Identity % Average Polymerase Basic 2 protein 99 67 83 Polymerase Basic 1 protein 99 80 89.5 Polymerase Basic 1-F2 protein 40 70 55 Polymerase Acidic Protein 99 70 84.5 Polymerase Acidic-X 81 92 86.5 Hemagglutinin 89 97 93 Nucleoprotein 100 91 95.5 Neuraminidase 100 84 92 Matrix protein 1 65 99 82 Matrix protein 2 44 86 65 Non-Structural protein1 100 83 91.5 Non-Structural protein2 (Nuclear Export Protein NEP) 47 93 70 The table was prepared on the basis of BLASTp results of each protein against PDB databank with respective query coverage and percentage identity and average was generated which shows NP is a most potent target. Graphical representing the above data shown in Fig 4.2. Fig. 4.2Graph for query coverage percent (blue bars) and identity percent (red bars) to validate target. The black line marks the average for coverage and identity percentage. The MSA and BLASTpresults of the protein revealed that membrane proteins were less conserved as compared to internal proteins. HA, NA and MP2 showed sequences were less consistent. Targeting internalprotein was a challenge. Thus, the functionally important protein was selected astarget protein. Cluster phylogeny of all proteins also revealed that NP and NA is same clade. Overall resulting NP as potentially better target which can cover broader spectrum of subtypes. 4.1.2 Retrieval of SequenceThe sequence was retrieved for accession number ARQ85817.1 from NCBI submitted at Genpeptdatabase. The sequence is >ARQ85817.1 nucleocapsid protein Influenza A virus (A/California/49/2017(H3N2)) MASQGTKRSYEQMETDGDRQNATEIRASVGKMIDGIGRFYIQMCTELKLSDHEGRLIQNSLTIEKMVLSAFDERRNKYLEEHPSAGKDPKKTGGPIYRRVDGKWMRELVLYDKEEIRRIWRQANNGEDATSGLTHIMIWHSNLNDATYQRTRALVRTGMDPRMCSLMQGSTLPRRSGAAGAAVKGIGTMVMELIRMIKRGINDRNFWRGENGRKTRSAYERMCNILKGKFQTAAQRAMVDQVRESRNPGNAEIEDLIFLARSALILRGSVAHKSCLPACAYGPAVSSGYDFEKEGYSLVGIDPFKLLQNSQIYSLIRPNENPAHKSQLVWMACHSAAFEDLRLLSFIRGTKVSPRGKLSTRGVQIASNENMDNMGSSTLELRSGYWAIRTRSGGNTNQQRASAGQTSVQPTFSVQRNLPFEKSTIMAAFTGNTEGRTSDMRAEVIRMMEGAKPEEVSFRGRGVFELSDEKAANPIVPSFDMSNEGSYFFGDNAEEYDN The sequence was used for further analysis like homology modelling and docking analysis. The H3N2 subtype was selected because in 2017 the more cases were reported were due to Influenza A virus H3N2. 4.2 Homology Modelling4.2.1 Template SearchingSince 3D structure for particular H3N2 was not available thus homology based model was prepared. The template selection was dome by running BLASTp for query sequence with accession number: ARQ85817.1. The template selection was done based on query coverage and identity percentage and also the X-ray resolution of the PDB structures was also considered for selection of template protein structures. The graphical representation for BLASTp is shown in Fig 4.3 Fig 4.3BLASTp result against PDB databank for the search of templates. Table BLASTp result of best PDB IDs Accession No. Query coverage % Identical% Resolutions (Ã?) 5TJW 100% 91 3.2 3ZDP 100% 91 2.7 2Q06 100% 90 3.3 Homology modelling and Validation BLASTp result for Nucleoprotein of the selected query sequence for homology modelling Fig. 4.4 Alignment of the query sequence with the templates Figure 4 showing the DOPE scores of all the models generated the model with the least DOPE value is considered the best model. Summary of successfully produced models by Modeller 9.19: Filename molpdf DOPE score GA341 score qseq1.B99990001 2357.06909 -50732.55859 1.00000 qseq1.B99990002 2266.21240 -50201.01563 1.00000 qseq1.B99990003 2378.13721 -50386.50391 1.00000 qseq1.B99990004 2279.57153 -50766.84375 1.00000 qseq1.B99990005 2377.82813 -50846.16406 1.00000 4.3 Validation of Protein models generatedThe validation of the models generated by homology modelling was done through Rampage online tool (), ProSA () and ERRAT () which use different factors to analyses the protein model. Result of Rampage and procheck Ramachandran plot generated using Rampage plot analysis server In the ERRAT graphical result the green bars shows the relaxed energies of the residues while the graph lines in red indicate residues with higher energies and yellow indicating residues with an average energies. ERRAT result gives the overall quality of the modelled protein as Quality factor equal to 85.9196, which is a satisfactory value. The ProSA result shows the model as the black dot in the X-Ray region of the redundant PDB structure database (Wiederstein et al., 2007). Where number of residues are shown. The modelled protein lies in the region of 400 to 600 residue length and respectively Z- score is generated. Thus, overall model quality is depicted as Z-score equal to â??9.17 of the respective residue length of 498. Table: Summarizing the validation results Model No. Number of residues in the favoured region (~98.0% expected) Number of residues in the allowed region ( ~2.0% expected) Number of residues in outlier region ERRAT Quality factor ProSA Z- Score qseq1.B99990001 482 (97.2%) 13 (2.6%) 1 (0.2%) 86.9281 -9.35 qseq1.B99990002 485 (97.8%) 10 (2.0%) 1 (0.2%) 82.3285 -9.12 qseq1.B99990003 486 (98.0%) 8 (1.6%) 2 (0.4%) 86.9198 -9.17 qseq1.B99990004 484 (97.6%) 8 (1.6%) 4 (0.8%) 82.7731 -9.1 qseq1.B99990005 484 (97.6%) 9 (1.8%) 3 (0.6%) 85.1613 -9.1 ERRAT result In the ERRAT graphical result the green bars shows the relaxed energies of the residues while the graph lines in red indicate residues with higher energies and yellow indicating residues with an average energies (Colovos&Yeates, 1993). ERRAT result gives the overall quality of the modelled protein as Quality factor equal to 85.9196, which is a satisfactory value. Superimposition of model protein and template protein using Chimera software The structural alignment of the model protein and the template protein shows 84.74% identity. And the remaining region fall under loop region of the protein. The picture below shows all the one letter residues of the aligned regions of the superimposed regions (Meng et al., 2008). Fig. superimposed residues of the model protein The superimposed image of the model and the template protein showing the identical amino acid residues. 4.4 Active site predictionAn active site on a protein is called binding site where ligand binds, which is complementary to the ligand in size, shape, charge, and hydrophobic or hydrophilic character. An active site on a protein is called binding site where ligand binds, which is complementary to the ligand in size, shape, charge, and hydrophobic or hydrophilic character. Active site Meta pocket The identification of ligand-binding sites is often the starting point for protein function annotation and structure based drug design. Many computational methods for the prediction of ligand-binding sites have been developedin recent decades. Here we present a consensus method metaPocket, in which the predicted sites from four methods: LIGSITE, PASS, Q-Site Finder, and SURFNET are combined together to improve the prediction success rate. All these methods are evaluated on two datasets of 48 unbound=bound structures and 210 bound structures. The comparison results show that metaPocket improves the success rate from*70 to 75% at the top 1 prediction. MetaPocket is available at http:==metapocket.eml.org. Lipinski Filter Out of 95 PubChem IDs of the natural compounds collected from literature only 41 compounds followed Lipinski Rules of Five of which 30 were bioactive compounds. For the Lipinski following compounds enlisted in Table 4. The filtration was performed within PubChem database itself. The derivatives and analogues were for all these compounds collected, and docking studies were carried out against generated model for nucleoprotein. Table 4.3 Sort listed of natural compounds according to Lipinski Rules of FiveCID Structure Chemical name IUPAC M.Formula 1 326 -803968580 Cuminaldehyde 4-propan-2-ylbenzaldehyde C10H12O 2 370 8572566675 Gallic acid 3,4,5-trihydroxybenzoic acid C7H6O5 3 1203 -3400262661 Epicatechin 2-(3,4-dihydroxyphenyl)-3,4-dihydro-2H-chromene-3,5,7-triol C15H14O6 4 5570 867470964 Trigonelline 1-methylpyridin-1-ium-3-carboxylate C7H7NO2 5 6989 4363534685 THYMOL 5-methyl-2-propan-2-ylphenol C10H14O 6 7461 -1674961044 GAMMA-TERPINENE 1-methyl-4-propan-2-ylcyclohexa-1,4-diene C10H16 7 8082 4363585965 PIPERIDINE piperidine C5H11N 8 9064 -1675046906 Cianidanol (2R,3S)-2-(3,4-dihydroxyphenyl)-3,4-dihydro-2H-chromene-3,5,7-triol C15H14O6 9 10205 35009104715 Plumbagin 5-hydroxy-2-methylnaphthalene-1,4-dione C11H8O3 10 10212 -4262941754 Pentosalen 9-(3-methylbut-2-enoxy)furo3,2-gchromen-7-one C16H14O4 11 14896 -181787379914BETA-PINENE 6,6-dimethyl-4-methylidenebicyclo3.1.1heptane C10H16 12 16793 -812363320 AC1L28P3 (17R)-3,17-diacetyloxy-13-methyl-6,7,8,9,11,12,14,15,16,17-decahydrocyclopentaaphenanthren-16-yl acetate C24H30O6 13 65036 -34002106332 Allicin 3-prop-2-enylsulfinylsulfanylprop-1-ene C6H10OS2 14 68077 -812358288 Tangeretin 5,6,7,8-tetramethoxy-2-(4-methoxyphenyl)chromen-4-one C20H20O7 15 72276 -3400262362 Epicatechin (2R,3R)-2-(3,4-dihydroxyphenyl)-3,4-dihydro-2H-chromene-3,5,7-triol C15H14O6 16 72281 -3400252417 Hesperetin (2S)-5,7-dihydroxy-2-(3-hydroxy-4-methoxyphenyl)-2,3-dihydrochromen-4-one C16H14O6 17 72344 -2537636783 Nobiletin 2-(3,4-dimethoxyphenyl)-5,6,7,8-tetramethoxychromen-4-one C21H22O8 18 160703 -2537667214 Amidinomycin (1R,3S)-3-amino-N-(3-amino-3-iminopropyl)cyclopentane-1-carboxamide C9H18N4O 19 442793 913010184 Gingerol (5S)-5-hydroxy-1-(4-hydroxy-3-methoxyphenyl)decan-3-one C17H26O4 20 689043 -3400364219 Caffeic acid (E)-3-(3,4-dihydroxyphenyl)prop-2-enoic acid C9H8O4 21 969516 -2476569694 Curcumin (1E,6E)-1,7-bis(4-hydroxy-3-methoxyphenyl)hepta-1,6-diene-3,5-dione C21H20O6 22 2763851 -2537640737 piperidine tert-butyl 3-(hydroxymethyl)piperidine-1-carboxylate C11H21NO3 23 5280343 -3400257570 Quercetin 2-(3,4-dihydroxyphenyl)-3,5,7-trihydroxychromen-4-one C15H10O7 24 5280445 -4262945408 Luteolin 2-(3,4-dihydroxyphenyl)-5,7-dihydroxychromen-4-one C15H10O6 25 5280863 -2537644989 Kaempferol 3,5,7-trihydroxy-2-(4-hydroxyphenyl)chromen-4-one C15H10O6 26 5281855 50344570 Ellagic acid C14H6O8 27 5315472 -812345049 Bisdemethoxycurcumin (1E,6E)-1,7-bis(4-hydroxyphenyl)hepta-1,6-diene-3,5-dione C19H16O4 28 5318517 -1397053975 Andrographolide (3E,4S)-3-2-(1R,4aS,5R,6R,8aS)-6-hydroxy-5-(hydroxymethyl)-5,8a-dimethyl-2-methylidene-3,4,4a,6,7,8-hexahydro-1H-naphthalen-1-ylethylidene-4-hydroxyoxolan-2-one C20H30O5 29 6708794 -2537657258 C27H30O7 30 11382524 -1675050021 (3E,4S)-3-2-(1S,5R,6R,8aR)-6-hydroxy-5-(hydroxymethyl)-5,8a-dimethyl-2-methylidene-3,4,4a,6,7,8-hexahydro-1H-naphthalen-1-ylethylidene-4-hydroxyoxolan-2-one C20H30O5 31 11528922 3500955640 Euniceasesquiterpenoid 2 (3S,3aS,5aS,6S,9aR,9bR)-6-hydroxy-3,5a-dimethyl-9-methylidene-3a,4,5,6,7,8,9a,9b-octahydro-3H-benzog1benzofuran-2-one C15H22O3 32 14492795 6088866567 Nimbaflavone 5,7-dihydroxy-2-4-methoxy-3-(3-methylbut-2-enyl)phenyl-8-(3-methylbut-2-enyl)-2,3-dihydrochromen-4-one C26H30O5 33 29927575 -2476542545 (3R,4S)-3-(E)-2-(1R,4aS,5R,6R,8aR)-6-hydroxy-5-(hydroxymethyl)-5,8a-dimethyl-2-methylidene-3,4,4a,6,7,8-hexahydro-1H-naphthalen-1-ylethenyl-4-hydroxyoxolan-2-one C20H30O5 34 44814409 -1179133324 N-4-(1R,3S,5S)-3-amino-5 methylcyclohexylpyridin-3-yl-6-(2,6-difluorophenyl)-5-fluoropyridine-2-carboxamide C24H23F3N4O 35 93703717608646946863 Bisdemethoxycurcumin isoxazole 4-(2Z)-2-5-(E)-2-(4-hydroxyphenyl)ethenyl-1,2-oxazol-3-ylideneethylidenecyclohexa-2,5-dien-1-one C19H15NO3 36 53361971 2257133919 piperazin 4-(2-chloro-4-nitrophenyl)piperazin-1-yl-3-(2-methoxyphenyl)-5-methyl-1,2-oxazol-4-ylmethanone C22H21ClN4O5 37 54670067 -812316558 38 86287562 50330336 Ascorbic acid (2R)-2-(1S)-1,2-dihydroxyethyl-3,4-dihydroxy-2H-furan-5-one HC6H7O6 39 101306757 913028060 MWPUYUVXTZFYJY-CINZLWFVSA-N C26H34O6 40 122179244 -15660139544 Matteflavoside G (2S)-2-(3,5-dihydroxy-4-methoxyphenyl)-5-hydroxy-6,8-dimethyl-7-(2S,3R,4S,5S,6R)-3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yloxy-2,3-dihydrochromen-4-one C24H28O12 The library was extended by adding the derivatives and analogues of the compound for which 3D structure were present which were used for docking analysis. 4.5 Results of docking Docking results of chemical compounds against Nucleoprotein target protein as standard S.no Literature based compounds CID Binding Affinity 1 NucleozinNUZ (4-(2-chloro-4-nitrophenyl)piperazin-1-yl-(5-methyl-3-phenyl-1,2-oxazol-4-yl)methanone) 2863945 -9.2 2 (+)-(S)-2-(6- methoxynaphthalen-2-yl)-propanoic acid Naproxen 156391 -8.8 3 1H1,2,3-triazole-4-carboxamide CBX 21307 -6.5 4 4-(2-chloro-4-nitrophenyl)piperazin-1-yl3-(2-chloropyridi n-3-yl)-5-methyl-1,2-oxazol-4-ylmethanoneOMM OMM -9.6 5 4-(5- romanil-3-methyl-pyridin-2-yl)piperazin-1-yl-3-(2-chlorophenyl)-5-methyl-1,2-oxazol-4-ylmethanone OMS OMS -9.5 6 4- (2-chloro-4-nitrophenyl)piperazin-1-yl3-(2-methoxypheny l)-5-methyl-1,2-oxazol-4-ylmethanoneLGH LGH -8.6 7 N-4-chlo ranyl-5-4-3-(2-methoxyphenyl)-5-methyl-1,2-oxazol-4-yl carbonylpiperazin-1-yl-2-nitro-phenylfuran-2-carboxami de BMS-885986 0MF 0MF -9.8 -9.4 8 N-4-chloranyl-5-4-3-(2-methoxyphenyl)-5-methyl-1,2-oxazol-4-ylcarbonylpiperazin-1-yl-2-nitro-phenylpyridine-2-carboxamide BMS-885838 0MR 0MR -9.8 -9.0 9 N- 4-chloranyl-5-4-3-(2-methoxyphenyl)-5-methyl-1,2-oxaz ol-4-ylcarbonylpiperazin-1-yl-2-nitro-phenylthiophene-2 â??carboxamide BMS-883559 0MH 0MH -10.1 -9.8 10 Ingavirin 9942657 -5.8 11 3-(2-chlorophenyl)-5-methyl-1,2-oxazol-4-yl-4-(4-nitrophenyl)piperazin-1-ylmethanone 2852179 -8.5 12 10-(2-fluorophenyl)-1,3,7,7-tetramethyl-5-phenyl-8,10-dihydropyrimido1,2pyrrolo4,5-b1,4oxazine-2,4-dione 23610130 -9.9 Ligand Formula Wt Hb d Hb a Log P Mol. ref BE 0MF C27 H24 Cl N5 O7 565.50 1 11 4.255999 143.181183 -9.6 0MH C27 H24 Cl N5 O6 S 581.50 1 10 4.724499 148.792221 -9.7 0MM C20 H17 Cl2 N5 O4 462.00 0 8 2.678580 109.841866 -8.8 0MR C28 H25 Cl N6 O6 576.50 1 11 4.057999 148.710220 -9.6 0MS C21 H20 Br Cl N4 O2 474.50 0 6 3.959818 113.545471 -8.8 LGH C22 H21 Cl N4 O5 456.50 0 8 3.410699 116.519859 -8.6 NUZ C21 H19 Cl N4 O4 426.50 0 7 3.402099 109.967865 -9.8 21307 235.00 1 3 2.620 66.181686 156391 230.00 1 3 3.036 66.550789 2852179 426.50 0 7 4.174018 112.898865 2863945 426.50 0 7 4.174018 112.898865 23610130 433.00 0 5 4.790699 119.277466 4.5.1 Derivatives and analogues:PubChem ID Binding affinity (Kcal) RMSD U.B. RMSD L.B. 326 -5.6 00.00 00.00 370 -5.8 00.00 00.00 932 -8.2 00.00 00.00 1203 -5.1 00.00 00.00 2519 -5.7 00.00 00.00 4680 -9.3 00.00 00.00 5570 -5.1 00.00 00.00 6041 -5.8 00.00 00.00 6303 -7.4 00.00 00.00 6780 -7.5 00.00 00.00 6989 -6.1 00.00 00.00 7461 -5.7 00.00 00.00 8082 -3.7 00.00 00.00 9064 -8.1 00.00 00.00 10205 -7.3 00.00 00.00 10207 -8.9 00.00 00.00 10212 -7.9 00.00 00.00 14896 -10.6 00.00 00.00 16793 -9.5 00.00 00.00 37542 -6.7 00.00 00.00 65036 -7.5 00.00 00.00 68077 -7.5 00.00 00.00 68111 -8.3 00.00 00.00 72258 -5.6 00.00 00.00 72276 -8.3 00.00 00.00 72281 -8.5 00.00 00.00 72304 -7.6 00.00 00.00 72344 -7.7 00.00 00.00 73571 -8.1 00.00 00.00 100017 -5.8 00.00 00.00 121896 -7.1 00.00 00.00 160703 -6.1 00.00 00.00 197810 -7.0 00.00 00.00 357293 -9.5 00.00 00.00 358832 -7.7 00.00 00.00 439202 -5.1 00.00 00.00 439533 -8.3 00.00 00.00 441893 -10.5 00.00 00.00 442793 -6.0 00.00 00.00 445858 -5.8 00.00 00.00 689043 -6.0 00.00 00.00 969516 -9.1 00.00 00.00 2763851 -5.8 00.00 00.00 3080591 -7.8 00.00 00.00 3201567 -8.9 00.00 00.00 5280443 -8.4 00.00 00.00 5280445 -8.6 00.00 00.00 5280863 -8.2 00.00 00.00 5281607 -8.3 00.00 00.00 5281614 -8.0 00.00 00.00 5281616 -8.1 00.00 00.00 5281708 -8.2 00.00 00.00 5281767 -7.7 00.00 00.00 5281792 -8.0 00.00 00.00 5281855 -8.2 00.00 00.00 5315472 -8.2 00.00 00.00 5318517 -8.6 00.00 00.00 5367719 -5.4 00.00 00.00 5708351 -8.4 00.00 00.00 6473762 -8.5 00.00 00.00 6708794 -8.3 00.00 00.00 6857767 -8.5 00.00 00.00 6912202 -8.9 00.00 00.00 7061256 -8.1 00.00 00.00 7061258 -8.9 00.00 00.00 10004738 -8.9 00.00 00.00 10411189 -7.4 00.00 00.00 10412012 -9.1 00.00 00.00 11382524 -8.4 00.00 00.00 11528922 -7.7 00.00 00.00 11703004 -7.8 00.00 00.00 11869629 -9.1 00.00 00.00 11869631 -8.7 00.00 00.00 12309893 -8.0 00.00 00.00 12313376 -8.3 00.00 00.00 14492795 -8.6 00.00 00.00 15411809 -8.7 00.00 00.00 16122395 -9.6 00.00 00.00 16122493 -9.7 00.00 00.00 21679042 -8.4 00.00 00.00 21679044 -9.3 00.00 00.00 24848818 -8.0 00.00 00.00 29927575 -8.6 00.00 00.00 42607856 -9.8 00.00 00.00 42607949 -10.2 00.00 00.00 42607952 -9.8 00.00 00.00 44349180 -8.8 00.00 00.00 44631202 -8.3 00.00 00.00 44814409 -10.7 00.00 00.00 46906036 -8.7 00.00 00.00 46907273 -8.4 00.00 00.00 46907311 -8.5 00.00 00.00 46946863 -8.4 00.00 00.00 49797769 -10.7 00.00 00.00 53361971 -9.0 00.00 00.00 54670067 -5.7 00.00 00.00 54678501 -8.6 00.00 00.00 57395012 -8.6 00.00 00.00 57395013 -8.7 00.00 00.00 57401998 -8.6 00.00 00.00 57403747 -8.6 00.00 00.00 58771499 -8.8 00.00 00.00 66560707 -9.0 00.00 00.00 70696403 -7.1 00.00 00.00 86287562 -8.8 00.00 00.00 99720265 -8.2 00.00 00.00 101306757 -8.9 00.00 00.00 101713209 -8.8 00.00 00.00 101792074 -8.9 00.00 00.00 102434921 -9.8 00.00 00.00 122179244 -9.3 00.00 00.00 – 4.5.1.1 Top hit compounds with high binding affinityLipinski following compounds and Lead like molecules according to Meta pocket configurations CID MW Hbonddonar Hbond acceptor LogP Refractive index Binding Affinity Lipinski Yes/ No 4680 339 0 5 3.86 97.199 -9.3 Yes 357293 310 0 4 1.95948 79.685 -9.5 Yes 441893 430 2 4 4.6847 119.091 -10.5 Yes 73607 448 5 10 0.286 107.812 -9.3 Yes 46881227 418 5 9 0.2774 101.26 -9.9 Yes 42607856 402 4 8 1.305 99.8487 -9.8 Yes 42607949 418 4 9 0.9251 101.806 -10.2 Yes 107971 416 5 9 0.1871 101.879 -9.5 Yes 42607952 432 4 9 1.3136 106.401 -9.8 Yes 49797769 428 1 4 4.8929 118.091 -10.7 Yes 10836928 442 2 5 3.7838 119.457 -11.3 Yes 6419887 430 1 4 4.9728 118.115 -11.4 Yes 11385250 444 2 5 4.0078 119.551 -11 Yes 49797769 428 1 4 4.8929 118.091 -10.7 Yes 102434921 432 4 9 1.3136 106.401 -9.8 Yes 14896 136 0 0 2.9987 43.752 -10.6 Yes 44814409 440 3 5 4.85409 114.808 -10.7 Yes 3503 518 6 8 6.38224 145.306 -10.6 No 10621 610 8 15 -1.1566 140.699 -9 No 72277 306 6 7 1.2517 74.2878 -9.4 No 99474 414 1 3 5.7139 117.701 -9.1 No 114777 564 9 12 2.087 139.678 -10.1 No 442431 580 8 14 -1.1652 134.147 -10.8 No 1794427 354 6 9 -0.6459 82.5188 -9.2 No 5280805 610 10 16 -1.8788 137.495 -9 No 11250133 578 10 12 2.995 143.385 -9 No 44349180 332 2 4 3.9484 92.7201 -9.1 Yes 11999968 594 9 14 3.5417 142.358 -9.5 No 4.5.1.2 ADMET properties checking of top hit compoundsS.No. CID Binding Energy Mol. Formula Chemical Name Type Source Drug likeness ADME Prediction Toxicity Prediction LR5 CMC in- virto Caco-2 cell (nm/sec) HIA (more than 90%) PPB (less than 90%) Ames Test Carcinogenicity Rat Model Mice Model 1 6419887 -11.4 C27H42O4 Hecogenin Organic compound Sisal plant Stable 1 Failed Moderate well absorbed High Non- Mutagen Passed Failed 2 10836928 -11.3 C27H38O5 Muzanzagenin sapogenin roots of Asparagus africanus Lam. (Liliaceae) Stable Qualified Moderate well absorbed Low (Passed) Non- Mutagen Passed Passed 3 11385250 -11 C27H40O5 Asparacosin A saponins  the roots of Asparagus meioclados Stable 1 Failed Moderate well absorbed High Non- Mutagen Passed Passed 4 44814409 -10.7 C24H23F3N4O Stable Qualified Moderate well absorbed High Mutagenic Failed Passed 5 49797769 -10.7 C27H40O4 Stable Failed Moderate High High Non- Mutagen Passed Passed 4.6 Protein- ligand interaction analysis of Lead- like Compound This is the protein ligand interaction allocated using Discovery Studio software, showing the ligand with color compound properties interacting with Residues shown as three- letter code. Main interaction is shown as green bonds and specific distances is shown. All the other interaction are also favorable interaction. 2D protein â?? ligand complex representing specific amino acids and bond types Conventional hydrogen Bond interaction: : UNL 1:H -:SER 457:O (2.34Ã?). Conventional hydrogen Bond interaction: :THR 390: OG1 -:UNL1:O (2.82Ã?) CONCLUSION The proteins of influenza A virus vary a lot from one subtype to another, more importantly the membrane proteins which are also treated as important targets against influenza and also play important roles but the hemagglutininwith same subtype is 35% conserved while among different subtype it is only 15% conserved (). In this work MSA analysis of different protein of Influenza A viruses is performed of which it was found that all the internal proteins are more conserved as compare to membrane proteins that are initial antigenic targets but since they are highly mutating thus, MSA analysis of other proteins of influenza A viruses were performed whichconcluded that nucleoprotein is highly conserved and is also functionally important protein. The results were validated by sequence- structure analysis using BLAST program which also confirmed the conserved Nucleoprotein. It is also most abundantly produces protein in influenza viruses which suggests it as a potent drug able target. Since in 2017 the more cases were against A/H3N2 virus were reported for which vaccines were less effective so in this work H3N2 strain is selected for which 3D structure was not found in Protein Data Bank. Thus, homology modelling was performed of the query sequence ARQ85817.1 for which BLASTp was performed. From which maximum query coverage was100% and the identity were above 90% which were used as template for model building using Modeler software. The alignment of template and query sequence was enough good the models least DOPE score was selected as best model. The model was validated using Rampage which gave 98% residues of the modelled protein in allowed region which resembled stability of the protein model and also quality of the model. ProSA server was used which gave the Z-Score of the modelled protein which showed overall quality of the Modell against all the X-RAYS and NMR structures present in the PDB database. ProSA give result according to the length of the protein sequence thus -9.17 according to the light of 498 residues was comparably better from other models. ERRAT software shows the residue energies with a coverage of 9 residues at a time for which result was around 86.5 % which is also satisfactory output. Ligand library preparation was a challenging work against Nucleoprotein thus all the literature based standard available drug-able molecules were collected using different resources like PubChem, DrugBank, BindingDB site and PID database of which compound maximum binding energy was set as standard and natural small molecule compound was screened out as better compound against nucleoprotein. Literature based natural compound drug library was prepared which were mainly used for other class target. Thus, were used novel molecules against it.The screening of the molecules were done to check the drug likeliness of the compounds and active compounds and one following Lipinski rules of 5 were used for docking studies, out of 95 literature based compounds only hand full of compound were used for docking analysis. Derivative sand analogues were also collected. The docking of all drug like molecules were performed using Autodock Tools. The table for docking molecules were noted out which the Lipinski following molecules with good binding energy were selected for ADME property checking. The sort listed compounds which fulfilled the drug like properties and ADME properties were analyzed for protein- ligand interaction analysis. The interaction analysis showed the stability of protein- ligand complex.