Saturday, December 24, 2016

A Proposal for Big Data Analytics in Health Care

Introduction
With the contemporary cloud, mobile, and streaming computing, wireless automation, Web technologies in the competitive data-driven market and Internet-based economy, data in explosion becomes ample and ubiquitous across both public and private sectors (Gartner, 2013; Information Builder, 2013). Data in health care such as PHI (Personal health information), PII (personally identifiable information), EHR (Electronic Health Records), or EMR (Electronic Medical Records) generated exponentially in a healthcare setting is never at rest. While patients see clinicians or physicians for diagnostic tests, data or information about these visits and procedures flows among healthcare providers, healthcare systems, health insurers, and healthcare networks. Other data includes enrollment information, physician credentialing, appointments, fee payment schedules, medical images, and care management documentation. The health care data in motion drives the need for taking analytics to improve healthcare services, reduce costs, enhance R & D in medicine, predict the probability of the diseases’ occurrence, and enable actionable decision-making (Information Builder, 2013; Schneiderman, Plaisant, & Hesse, 2013).
This individual project in Unit 5 (U5 IP) provides a design proposal concerning an analytics Hadoop solution that is applied to the business problem of analyzing various large data sets of PHI, PII, HER, or EMR in a hospital network system. The design proposal of the Cyber-Healthcare System gives corporate management a descriptive information and guidance of the architecture in the project that solves business problems of analyzing huge sets of scattered complex data in the healthcare industry in the North-East region, e.g., New Hampshire (NH), Massachusetts (MA), Rhode Islands (RI), and Connecticut (CT). The document also provides the related readers a generic informative overview of the proposal for big data analytics in healthcare for retrieval of insights that covers four following sections:
                        I. Network Architecture
II. NoSQL Database - Cassandra
III. Business Plan
IV. Security Policy Proposal
            Each section will describe detailed information in data science for big data analytics in health care.
I. Network Architecture
The design proposal of the Cyber-Healthcare System (The System) provides an overall description of the solution in data analytics to solve the challenges and business problems in the healthcare field.
     A. Hadoop Ecosystem
The Cyber-Healthcare System will implement and deploy the Hadoop ecosystem to hospitals in the area. As the de facto standard to manage big data, Apache Hadoop - an open source Java-based framework that uses parallel data processing across distributed clusters - is chosen for this project (Apache Software Foundation, 2014). A simplified Hadoop architecture includes four major components:
          1. Hadoop Common
The component consists of Java libraries and utilities to support other components.
          2. Hadoop YARN
The component does job schedules and manages cluster resources.     
          3. HDFS (Hadoop Distributed File System)
The HDFS provides high-throughput access to application data.
          4. Hadoop Map/Reduce
It performs Map and Reduce functions on large data sets in parallel processing to retrieve insightful health information for patients, clinics, and hospitals.

Figure 1 shows a simplified Hadoop framework with four components: YARN Frameworks, Common Utilities, HDFS, and Map/Reduce Computation.
Source: Adapted from Hadoop Software Foundation, 2012.


1.      Figure 2 displays a high-level HDFS architecture with name node and multiple data nodes in data processing.
Source: Adapted from Borthakur (Apache Hadoop Organization, 2012).
                Healthcare tools in the Cyber-Healthcare System are designed to assist health authorities in long-term plans, business strategies, and healthcare policies. The healthcare tools include diagnostic tools for monitoring, evaluating, and assessing data. Other tools are used to support priority scheduling, identify effective strategies, evaluate the cost, plan resource, calculate budget, and program and implement tasks.
     B. External interfaces
            The Cyber-Healthcare System will allow related users such as frontline healthcare providers, nurses, physicians, and administrator to enter data or view and search health information in the Hadoop ecosystem. Patients can access and view their health records only. However, administrators and designers have full privileges and authorization in options such as read/write/delete/save or change data, files.  
          1. User interface
            There is one unique graphic user interface (GUI) for four types of users.
               - The GUI with basic privilege is provided to patients who can read, view, print out individual health records, information. They can schedule appointments, send emails for questions, etc.
                - Nurses, physicians or data entry workers are provided more privileges such as to read, view health information, enter data, search or query for useful information, etc. on the GUI.
                - Administrators and designers have full privileges such as read, write, delete, change, query, extract data, etc. on the GUI.
                - Medical researchers have high-level privileges to access data and information in the System for their work.
          2. Hardware interface and software interface
            The Cyber-Healthcare System with a backbone of Hadoop environment supports NoSQL databases, aggregate data models, and key-value databases to perform the map-reduce computing and store the results of the mappers and the reducers in the materialized views with high fault-tolerance. Users can use industry standard formats like XML, JSON, texts on complex data – semi-unstructured data.
            The hardware interface comprises personal computers, desktops, laptops, Smartphones, iPhones, iPads, etc. (Natarajan, 2012).
            Software interface includes:
               a. Platforms:
                    - OS Windows 7, 8, 8.1, 10 (32-bit, 64-bit)
                                       Windows Server 2008 (64-bit)
                                       Windows Server 2012 (64-bit)
                                       Windows Server 2012 R2 (64-bit)
                                       Windows Vista SP1 and later (32-bit and 64-bit)
                     - Mac OS X hosts (64-bit)
                                        Mavericks: 10.9
                                        Yosemite: 10.10
                                        EI Capitan: 10.11
                     - Linux hosts (32-bit or 64-bit)
                                        Ubuntu 10.04 to 16.04
                                        Debian GNU/Linux 6.0 (“Squeeze”) and 8.0 (“Jessie”)
                                        Oracle Enterprise Linux 5, Oracle Linux 6 and 7
                                        Redhat Enterprise Linux 5, 6 and 7
                                        Fedora Core / Fedora 6 to 24
                                        Gentoo Linux
                                        openSUSE 11.4 to 13.2
                     - Solaris hosts (64-bit only)
                                        Solaris 11
                                        Solaris 10 (U10 and higher)                                      
               b. Emulated hardware
                     - Input devices: Standard PS/2 keyboards and mouse
                     - Graphics: Standard VGA devices
                     - Storage: Intel PIIX3/PIIX4 chips, the SATA (AHCI) interface, and two SCSI adapters (LSI Logic and BusLogic)
                     - Networking: Linux kernels version 2.6.25 or later
                                            Windows 2000, XP and Vista, drivers
                     - USB: xHCI, EHCI, and OHCI
     C. Data flow diagrams
            Healthcare data is processed dually in traditional databases in data warehouse and advanced Hadoop subsystem in ETL (Extract, Transform, and Load) process in parallelism as shown in Figures 3 and 4 below:

            Figure 3: Process flow of the large healthcare data in Map/Reduce functions in Hadoop subsystem.
Source: Adapted from Intel, 2016.


            Figure 4: Data flow in both traditional data warehouse and Hadoop subsystem in parallelism in the Cyber-Healthcare System. 
Source: Adapted from Intel, 2016.

            In the Cyber-Healthcare System, a XML data flow document can be written in XML format. For example, a typical XML design document sample is programmed as follows:
     <?xml verson=”1.0”?>
     <!—File name: TheCyberHealthcareSystem.xml -->     
          <Group>
               <Groupname>Arizona</Groupnames>
                    <Hospital>XXX</Hospital>
                         <DeptInternalMedicine>AAA</DeptInternalMedicine>
                         ….
                         <DeptIntensiveCare>BBB</DeptIntensiveCare>
                         …..
                         <DeptFamilyCare>BBB</DeptFamilyCare>
                         ……..
                    ….
                    <Clinic>YYY</Clinic>
                    …..
                    <Nursinghome>ZZZ</ Nursinghome>
                    ……
               <Groupname>Colorado</Groupnames>
                    <Hospital>III</Hospital>
                         <DeptInternalMedicine>OOO</DeptInternalMedicine>
                         ….
                         <DeptIntensiveCare>PPP</DeptIntensiveCare>
                         …..
                         <DeptFamilyCare>QQQ</DeptFamilyCare>
                         ……..
                    ….
                    <Clinic>JJJ</Clinic>
                    …..
                    <Nursinghome>KKK</ Nursinghome>
               …..  
          </Group>

     <Patient>
          <Patientname>StevenConte</Patientname>
               <PatientID>7742661926</PatientID>
               <PatientDOB>04301975</PatientDOB>
               <PatientAddress>XYZ</PatientAddress>
               <PatientOccupation>zyx</ PatientOccupation>
               <PatientAge>46</PatientAge>
               <PatientHeight>5ft8Inch</PatientHeight>
               <PatientWeight>150</PatientWeight>
               <PatientHeight>5ft8Inch</PatientHeight>
               <PatientIllness>Vertigo</ PatientIllness>
               …………
     </Patient>
     ………..
     D. Overall system diagrams
            The overall Cyber-Healthcare System based upon a Hadoop ecosystem consists of a Hadoop YARN, Common Utilities Unit, HDFS, Hadoop Map/Reduce, and NoSQL Database Cassandra. Hadoop YARN is a communication and control unit that provides job scheduling and cluster resource management. Common Utilities Unit is a supportive unit to provide libraries and utilities. HDFS provides accessing to health data sets. Hadoop MapReduce applies parallel processing on the healthcare large data sets effectively. The large data sets in Tetra Bytes are broken into 64 or 128 MB and stored in multiple low- cost commodity nodes in HDFS for Map and Reduce functions to retrieve insightful information for end-users such as patients, frontline care providers (e.g., nurses, physicians, healthcare technologists, etc.). The column-oriented database Cassandra stores healthcare data fed to the HDFS for data processing.

            Figure 5 shows the overall central Hadoop System with the control unit Hadoop YARN, supportive unit Common Utilities, and Cassandra in the Cyber-Healthcare System.
Source: Created by ThienSiLe (TSL), December 2016
     E. Communication flow chart
In communication, the Cyber-Healthcare System consists of the central Hadoop ecosystem that connects to four groups, i.e., New Hampshire, Massachusetts, Rhode Islands, and Connecticut in star configuration as shown in Figure 6 below. Each group is linked to local hospitals, outpatient clinics, nursing homes, and rehabilitation centers. Each organization has many departments. Each department has its own care team or frontline care providers that include physicians, nurses, and family members who provide health care services to patients. Also, the environment group that comprises regulators, Medicare, Medicaid, insurance companies, healthcare purchasers, and research funders can communicate with institutions such as hospitals, clinics, nursing homes, rehabilitation centers, etc.
Figure 6 depicts a high-level communication flow chart among agencies in the Cyber-Healthcare System.
Source: Created by TSL, December 2016

II. Database
The Cyber-Healthcare System can take advantage of using Hadoop HDFS as an inexpensive option for storing big data of healthcare. However, because of the colossal data sets including patient medical activities, clinical health information like electronic health records, public health data in the North-East region, the column-oriented database Cassandra is more suitable for storage database in the Hadoop Ecosystem. Notice that Hadoop HDFS is still a distributed storage for data processing in the system.
     A. Health data
            Health data is a part of big data - the broad and complex data sets - that include some structured and unstructured data. Healthcare data can be individual health records, health information, patient-related blogs, mobile data, PDF files, web log data, forums, website content, spreadsheets, photos, clickstream data, RSS feeds, word processing docs, medical scanning images, videos, audio files,  RFID tags, social media data, XML patient data, call center transcripts, etc. The sources of the healthcare data include electronic patient record system, the Internet, search engines, online social media networks like Facebook, Twitter, medical information exchanged posted by millions of people, e.g., patients, healthcare providers, disease researchers, etc. in hospitals, clinics, nursing homes, rehabilitation centers across the globe. Data sets can be replicated and shared among nodes in the scalable distributed clusters of the Cassandra database system.
Insightful information that is extracted from healthcare data can be categorized into three categories (Schneiderman, Plaisant, & Hesse, 2013):
          1. Personal health information:
Physicians and patients collect information about their practice and own health habits.
          2. Clinical health information
Electronic health records systems can enhance a health care or cure to patients and useful insights into pragmatic patterns of treatment.
          3. Public health information
A large quantity of public health data is collected to assist policy makers in more reliable decisions.
     B. Database Framework
            Cassandra is a non-relational data storage system that does not require a relation schema, joins concept with some level of tolerance to ACID (Atomicity, Consistency, Isolation, and Durability) properties (Sadalage, & Fowler, 2012). Cassandra can handle big data, particularly healthcare data, in massive volume, various forms, and fast processing for the useful insights. Cassandra that is a contemporary column-oriented database is a NoSQL big-table-style data model of the two-level aggregate structure. The first level value is a row identifier to form a map of more detailed values. The second level values are columns that contain column keys and data values. An aggregate is a collection of data that user can interact as a unit. Data in the column-oriented database is structured in the row-oriented and column oriented way that allows users to treat the aggregate as units of data within a row aggregate.
Based on personal health information such as patient medical activities, clinical health information like electronic health records, public health data stored in column-oriented database Cassandra, healthcare data and data analytics holds the promise to improve the quality of healthcare delivery, and contains the potential to enhance patient care, save lives, and lower the treatment cost. Big data analytics (BDA) on data in healthcare in Cassandra plays a crucial role in the improvement of the quality of healthcare, patient treatment, and disease prevention. The promise of data analytics and big data in healthcare includes supporting a wide range of medical and healthcare activities in physicians’ offices, hospital networks, outpatient clinics, etc. The potential consists of improving patient care, saving lives at lower cost in many advantages of using big data to healthcare in clinical operations, research and development, public health, evidence-based medicine, genomic analytics, pre-adjudication fraud analysis, patient profile analytics, etc.
The typical findings in healthcare data include the promise and potential of the BDA in healthcare in Cassandra as shown below:
          a. The promise of supporting a wide range of medical and healthcare functions. 
          b. The potential to improve the quality of healthcare delivery at lower cost (Raghupathi, & Raghupathi, 2014).
Cassandra provides a lot of advantages to the System:
               - Since it uses an open-source technique, it is cheap and easy to implement.  
               - Data is replicated to multiple nodes and can be partitioned.
               - It is easy to distribute in the network,
               - It does not require a schema.
               - It can scale up and down.
               - The data consistency requirement (CAP) is relaxed.
     C. Cassandra is a good fit 
The column-oriented database Cassandra is a good fit for the Cyber-Healthcare System (Sadalage, & Fowler, 2012; Upadrasta, & Chungath, 2014):
(1) Event logging
Cassandra with its ability to store any data structures is a good choice to store patient records, patient medical activities, doctors’ visits, etc. All events can be written in columns and row key of the form patient ID. Since it can be scaled the writes, Cassandra becomes ideally for recording patients’ activities or such events.
 (2) Content management
Column families in Cassandra consist of many columns for data entries with tags, categories, links, trackbacks, etc. in different columns. Comments can be either stored in the same row or moved to different key spaces. Illness or diseases can be put into different column families (Confino, 2010).
 (3) Counters
In the analyses of healthcare data, healthcare providers usually need to count and categorize patients for calculating analytics or statistics. Cassandra provides CounterColumnType command during the creation of a column family.
 (4) Expiring usage
Frontline healthcare providers such as physicians, nurses, technologists, etc. may want to show patient records, prediction of treatment, prevention cure, etc. They can do that by using expiring columns. Cassandra allows displaying information in certain given time then the information is deleted automatically. The time is known as TTL (Time To Live) and is defined in seconds. The column will be deleted after the TTL has elapsed. 
 (5) When Cassandra is not to use
Cassandra should not be used for Writes and Reads that require ACID transactions due to possible consistency failure. Cassandra cannot handle early prototypes or initial tech spikes because early stage may require columns change. The cost may be higher in Cassandra for query change as compared to schema change in traditional relational databases.
III. Business Plan
The dynamic and energetic world has constantly changed and intertwined rapidly with full uncertainty and chaos. It is almost impossible to predict the different future from the known present. In an aggressively competitive healthcare business environment, many organizations realize that innovation of the existing systems with the interaction between humans and technology in the Cyber-Healthcare System is important in healthcare services and medicine research and development (R & D). The business plan of the Cyber-Healthcare System that uses big data analytics in healthcare is provided as follows (VA/VHA Guidebook, 2011; Bplans, 2016):
     1. Executive summary 
            In the efforts to take analytics to improve healthcare services, reduce costs, enhance Medicine R & D, the proposal of Cyber-Healthcare System is provided to hospitals, outpatient clinics, nursing homes, and rehabilitation centers in four states (New Hampshire, Massachusetts, Rhode Islands, and Connecticut) for family medicine practice, teaching and research. It focuses on diagnosis and treatment to the patients at all ages. The system also emphasizes preventive medicine, wellness, and health of the patients.     
     2. Description
            Proposed establishment of the Cyber-Healthcare System utilizes big data analytics in health care in the North-East region to give corporate management a descriptive information and guidance of the architecture and framework. The project solves business problems of analyzing huge sets of scattered complex data, particularly healthcare services in the healthcare industry.
     3. Objectives
          a. Creating a new health care system that consolidates the obsolete and wasteful systems in North-East region.
          b. Providing high-quality healthcare to residents in the area.
          c. Innovating a medical practice that will exceed patients’ needs and expectations.
          d. Increase the number of patients by 20% each year with better services and references.
          e. Providing friendly GUIs (Graphic User Interfaces) for patients, frontline healthcare providers, and professional researchers to access information and data (e.g., PHI, PII, EHR, or EMR) appropriately from the designated websites.
          f. Reducing doctors’ appointments from 40% down to 5%.
          g. Increase the number of average visits by 20% in each state.
     4. Background
            Research data shows that most of the well-informed, health-conscious healthcare organizations can reduce the costs, decrease absenteeism, and increase productivity. In the growing use of big data generation and analysis, a giant IBM is a leader in application and integration of BDA in many fields, particularly in healthcare. IBM Watson employs BDA in data generation with the cognitive computing platform. It can connect massive dynamic and complex text data among patient records and medical literature to create hypotheses among hundred variables for treatment recommendations. Watson uses big data analytics to process the data for specific patients whose health records, historical illness, and physicians’ studies are examined carefully. The solution is deployed at many academic medical clinics and hospitals such as Cleveland Clinic, Mayo Clinic. Some institutions are partners with IBM Watson in an ecosystem to provide effective treatments at lower cost (Power, 2015). Another example is USAA, the financial services provider that uses IBM Watson to assist about 15,000 members to adjust their life from military to civilian yearly. USAA and IBM Watson use a software tool, Enhanced Virtual Assistant, or EVA, to execute transactions such as transferring money, paying bills, communications from BDA’s data generation and analysis in their 140 products.    
     5. Organizational Assessment
            The current hospitals have supported patient health promotion, but the programs have failed to improve the health outcomes and revenue. The recent survey of about 1000 patients and EHRs indicates that patients’ wellness and health are failing (VA/VHA Guidebook, 2011).
          a. 56% of the patients have BMI (body mass index) over 25; 23% of patients are obsessed (BMI > 30).
          b. 25% of patients report that they use of tobacco.
          c. 67% of patients report that they are under stress in the workplace.
          d. 95% of patients report that they want a wellness program.
          e. The average claim of $442 is reported on back injuries.
            Notice that with the current national health trends, the percentage of patients’ overweight and obesity will continue to go up with more compensation claims.
          6. Proposed Services
            Healthcare organizations that participate in the Cyber-Healthcare System will focus on a comprehensive patient health promotion program to prevent the cost rise, enhance the wellness and health of the patients:
               - Detecting the diseases at earlier stages.
               - Managing individual and population health efficiently.
               - Detecting healthcare fraud more quickly.
               - Estimating a large amount of historical data such as length of stay, choosing elective surgery, no benefit from surgery, etc. 
               - Patients at risk for medical complication.
               - Patients at risk for advancement in disease stages.
               - Causal factors of illness progression.
               - Pinpointing patients who are the greatest consumers of health resources.
               - Providing patients with the information for making informed decisions.
               - Managing patients’ own health.
               - Tracking healthier behaviors.
               - Identifying treatment.
               - Reducing re-admissions by lifestyle factors that increase a risk of the adverse event.
               - Improving outcomes by examining vitals from at-home health monitors.
               - Managing population health by detecting vulnerabilities within the patient population during disease outbreaks.
          7. Target Market Analysis
            With the up-to-date number of patients in four states (NH, MA, RI, and CT), 57 % are female. The median age is 50 with the age distribution: (1) 400,000 patients between 20 and 29, (2) 800,000 patients between 30 and 39, (3) 800,000 patients between 40 and 49, (4) 1,600,000 patients between 50 and 59, and (5) 600,000 patients at 60+ (VA/VHA Guidebook, 2011).
            The Patient Health Promotion Program (PHPP) from the Cyber-Healthcare System is open to all patients in four states on a voluntary basis. The annual assessment will be performed on each population group for specific services with unique health needs.
          8. Marketing Plan
            New Patient Orientation, State Human Resources, and Occupational State Health will conduct awareness classes to target patients, distribute information materials such as booklets, bulletins, etc. to all of them through email broadcasts, signage, newspapers, and fliers. A Website with an events calendar is available to the patients.    
          9. Resources
            Resources for implementing the Cyber-Healthcare System including hardware, software, computers, workstations, facilities, administrators, and technical support groups are operated and controlled by the State Public Health Offices and Board of Trustees. Additional resources are available in annual assessments. In-kind resources such as Reproduction, Information Resources, Medical Media to support the PHPP program are anticipated.    
IV. Security Policy Proposal
            The Cyber-Healthcare System complies with all regulations, policies, and governance for the medical healthcare industry in its design as follows:
     A. Regulations
In a practical view, market research and ethics in healthcare data based on Internet technology are usually at odds with each other. Big Data Analytics (BDA) presents both technical and strategic capabilities to generate value from the data they store for the organizations. With the blossom of BI (Business Intelligence) and BDA, there will be more security violation and privacy issues (Quora, 2014). There is a prominent risk of violation of the personal privacy. For example, terrorists likely hack healthcare systems such as the Cyber-Healthcare System to sabotage the system, harm people and take advantages for their own ideology, politics, or religion.
Big Data’s vital role in advancing medical purpose can be hindered by HIPAA (the US Health Insurance Portability and Accountability Act), Security, Privacy, and Breach Notification Rules that regulate medical information. These laws, rules, guidelines have restricted and governed the disclosure, security, collection, maintenance, transmission of electronic PHI or PHI used by healthcare providers, health insurances, or medical R & D groups. The PHI may include social security number, driver’s license number, account number, photographs, credit or debit card number, required security code, access code, password, medical information, health insurance information, username, security questions, etc. HIPAA also requires a covered entity to provide notice of privacy practices on the subjects. US Government has enforcement authority. Notice that HIPAA does not apply to health information that is not personally identifiable such as aggregate data in NoSQL databases. It also does not apply to health information used individuals or entities that are not covered in the definitions of covered entities in the HIPAA Act (Practical Law, 2016). Notice that many countries have their own healthcare laws but HIPAA appears to be one of the best in healthcare data governance in the US.
The System considers the issues (data privacy, security breaches) seriously and uses the latest antivirus software, firewall, etc. to protect the integrity of data and safeguard patients’ information. The System comply the government’s controversial in-depth regulations and obeys all medical rules. Notice that the Cyber-Healthcare System will work to obtain International Organization for Standardization ISO 9001 Certification in the healthcare industry (Nolan, 2015).
     B. Privacy Policies
            Information about users’ usages of the website is collected by using a tracking cookie, and server access logs. The collected information includes the following:
          a. The IP address from which user accesses the website.
          b. The type of operating system (OS) and browser user uses to access the System site.
         c. The date and time user accesses the Cyber-Healthcare System site.
         d. The html pages users visit.
         e. The pages address from where user followed a link to the System site.
            Some of the information is gathered by using a tracking cookie set by the Hadoop Analytics or Google Analytics service in the privacy policy. Users may refer the browser documentation for instructions on how to disable the cookie if they do not want to share the data with Hadoop or Google.
The Cyber-Healthcare System respects the patients’ rights. Patients have the following typical rights:
            - The right to receive notice of privacy practices from healthcare providers.
            - The right to see their protected health information and receive a copy.
            - The right to request changes to their records to correct errors or add information.
            - The right to have a list of PHI.
            - The right to request confidential communication.
            - The right to complain.
            The Cyber-Healthcare System gathers information to make the website more useful and friendly to visitors and better understanding how and when the website is surfed. The Cyber-Healthcare System does not collect or track personally identifiable information, or associate gathered data with any personally identifying information from the other sources.
            By using this website, user consents to the collection of this data in the manner and for the purpose to solve the challenges and business problems in the healthcare field (Apache Software Foundation, 2014; Natarajan, 2012).
     C. Governance
HIPAA is the federal Health Insurance Portability and Accountability Act of 1996 in Tennessee. It was designed to safeguard healthcare information, assist people to retain health insurance, and facilitate administrative costs’ control in the healthcare industry (HIPAA, 1996). On the privacy issue, HIPAA emphasizes on protection and maintenance of personal health information in all health-related organizations. HIPAA requires (1) frontline providers (e.g., physicians, nurses, etc.), (2) medical producers (e.g., pharmaceutical, medical device companies, etc.), and (3) payers (e.g., insurance companies) must comply all the law and rules in governance. 
            The Cyber-Healthcare System comply all HIPAA governance rules.
    D. Assumptions and limitations
            The Cyber-Healthcare System is developed and designed based on the following assumptions and limitations:
          1. Assumptions (Flower, 1999):
               - The System’s clients are patients.
               - The System’s contact with patients is high intensity, low touch.
               - Doctors are independent carriers of information and judgment.
               - Healthcare is event-driven.
               - Much of ill health will be predictable and preventable. 
               - Patients will be partners in managing their health.
               - Data in the System’s centralized repository is assumed clean, reliable, and credible.
               - All institutions such as hospitals, clinics, nursing homes, etc. use the same platform to access, view, query, and enter the large data sets in the centralized repository.
               - All frontline care providers in the care team are trained to use the System properly and professionally.
               - The System keeps all sources of time visible to the guest synchronized to a single time source, the monotonic host time.
          2. Limitations (Hortonworks, 2016)
               - Some experimental features are beta (labeled as experimental). Such beta features are provided but are not formally supported. However, users’ suggestions and feedback are welcome.
               - Poor performance with 32-bit AMD CPUs may affect Windows and Solaris platforms.
                - Poor performance with 3-bit Intel CPU model affects mainly on Windows, Solaris, and Linux kernel.
               - NX (no excuse, data execution prevention) only works for 64-bit OS computers
               - Windows XP has slower transmission rates because it supports segmentation offloading.
               - Shared folders are not supported on the OS/2 computers.
     E. Justification
            The Cyber-Healthcare System is a modern state-of-the-art system in the contemporary network of hospitals, outpatient clinics, nursing homes, and rehabilitation centers in the North-East region. The System is developed to eliminate isolation among hospitals, reduce inefficiency in care management, and prevent a loss of opportunities for advancing patient treatments. The Cyber-Healthcare System is designed with the following justifications:   
          1. Centralizing the scattered sources of colossal data sets from many agencies, various hospitals, and clinics.    
          2. Transforming unreliable huge data sets with duplication and redundancy in data and information to credible and reliable data sets.
          3. Establishing a large healthcare network system in the region to allow users such as patients and frontline care providers (physicians, nurses, family members) with the different privilege to access, view, search information that is needed or required for patient treatments, and cures at low cost possible.
          4. The Cyber-Healthcare System is implemented in Hadoop environment as described in Section I. Network Architecture above. 
          5. The System’s architecture is developed based on four target elements:
               a. Patients.
               b. Care team consists of physicians, nurses, family members.
               c. Organization includes infrastructures, resources such as hospitals, clinics, nursing homes and rehabilitation centers.
               d. The environment comprises regulation, policy, and market like regulators, Medicare, Medicaid, insurance companies, healthcare purchaser, research funders, etc.
          6. The System is designed to tackle the huge data sets’ challenges in the healthcare industry. Some healthcare data challenges are:
               a. Capturing data is difficult.
               b. Curation is not easy.
               c. Storage requires huge memory, disks.
               d. Sharing data is complicated. 
               e. Transfer data take a lot of time because of huge size. 
               f. Analysis of data requires advanced analytical tools.
               g. The presentation is sophisticated.
          7. Organizations in the System can provide better and high-quality services based on historical data from previous medical records of patients.
          8. The System has data visualization feature for users to access (Schneiderman, Plaisant, & Hesse, 2013):
               - Personal health information
               - Clinical health information
               - Public health information
Conclusion
In summary, this U5 IP document provided a design proposal of the Cyber-Healthcare System in the North-East region that used big data analytics in health care to cover four sections: (1) network architecture, (2) database Cassandra, (3) business plan, and (4) security policy proposal as follows:
     I. Network Architecture:
            The framework of using big data analytics on complex healthcare data was an Hadoop ecosystem with primary components, e.g., Hadoop Common, Hadoop YARN, Hadoop HDFS, and Hadoop Map/Reduce, and the database Cassandra. The external interfaces, data flow diagrams, communication flow chart, and overall system diagram were described in details.  
     II. NoSQL Database
            Cassandra played a critical foundation of storage of big healthcare data that included PHI, PII, EHR, HER. It is a good fit in the Cyber-Healthcare System where the Hadoop Distributed File Subsystem is in the loop of data processing.
     III. Business Plan
            The business plan addressed the executive summary, description, objectives, background, organizational assessment, target market analysis, market plan, and resources of the Cyber-Healthcare System.
     IV. Security Policy Proposal
            The proposal of security policy covered practice regulations, privacy policy, data governance, justifications, and assumptions and limitations. The Cyber-Healthcare complies with HIPAA and expects to pass ISO 9001 Certification in the healthcare industry (Nolan, 2015).
 The virtual Cyber-Healthcare System is a proposal of Hadoop framework to apply big data analytics to huge data sets in health care to improve healthcare services, reduce costs, enhance R & D in medicine, predict the probability of the diseases’ occurrence, and enable actionable decision-making. Notice that the design of four categories (network architecture, databases, business plan, and security policy proposal) in the System exceeds the requirement of three out of five given categories.


REFERENCES
Apache  Software Foundation (2014). What is apache hadoop?  Retrieved November 08, 2015 from http://hadoop.apache.org/

Bplans (2016). Free medical and health care sample business plans. Retrieved December 06, 2016 from http://www.bplans.com/medical_and_health_care_business_plan_templates.php

Confino, J. (2010). Why you need nosql in your toolbox. Retrieved October 17, 2016 from http://chariotsolutions.blogspot.com/2010/01/why-you-need-nosql-in-your-toolbox.html.

Flower, J. (1999). The revolution in our assumptions about healthcare. Retrieved
September 08, 2016 from http://www.well.com/~bbear/assumptions.html

Gartner Group (2013). Gartner predicts business intelligence and analytics will remain a top focus for CIOs through 2017. Press Release. Las Vegas, NV. Retrieved June 4, 2015 from http://www.gartner.com/newsroom/id/2637615.

HIPAA Act, (1996). The federal health insurance portability and accountability act. Retrieved October 19, 2015 from http://tn.gov/health/topic/hipaa.

Information Builder, (2013). Data in motion – big data analytics and healthcare. Retrieved December 04, 2016 from
http://docs.media.bitpipe.com/io_10x/io_109369/item_674791/data%20in%20motion%20big%20data%20analytics.pdf

Natarajan, R. (2012). Apache Hadoop Fundamentals – HDFS and MapReduce Explained with a Diagram. Retrieved November 01, 2015 from http://www.thegeekstuff.com/2012/01/hadoop-hdfs-mapreduce-intro/

Nolan, J. (2015). Would hospitals benefit from ISO 9001? Retrieved September 08, 2016
from http://advisera.com/9001academy/blog/2015/07/21/would-hospitals-benefit-from-iso-9001/

Power, B. (2015). Artificial intelligence is almost for business. Retrieved August 14, 2016 from https://hbr.org/2015/03/artificial-intelligence-is-almost-ready-for-business

Practical Law (2016). PLC - Data protection in the united states: overview. Retrieved November 21, 2016 from http://us.practicallaw.com/6-502-0467

Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education.

Schneiderman, B., Plaisant, C., & Hesse, B. (2013). Improving healthcare with  interactive visualization methods. Retrieved September 06, 2016 from https://www.cs.umd.edu/~ben/papers/Shneiderman2013Improving.pdf

Upadrasta, B.,  & Chungath A. (2014). “NoSQL, NewSQL, or RDBMS: How To Choose”. Retrieved October 17, 2016 from
http://www.informationweek.com/big-data/big-data-analytics/nosql-newsql-or-rdbms-how-to-choose/a/d-id/1297861

VA/VHA Employee Health Promotion Disease Prevention Guidebook (2011). Sample business plan – public health. Retrieved December 06, 2016 from www.publichealth.va.gov/docs/employeehealth/12-Sample-Business.pdf


























3 comments:

  1. Nice post! This is a very nice blog that I will definitively come back to more times this year! Thanks for informative post.
    UAS
    UAV

    ReplyDelete
  2. Very informative and impressive post you have written, this is quite interesting and i have went through it completely, an upgraded information is shared, keep sharing such valuable information. Clinical Analytics Software

    ReplyDelete
  3. AWSBig Data has been in the news a lot lately because of their recent acquisition of big data analysis platform Kinesis. It’s the latest in a number of moves AWS has made in the analytics space, which include partnerships with the likes of Cloudera and Hortonworks. While AWS does a lot of different things, their cloud computing services are an increasingly large part of their revenue and growth.

    ReplyDelete