Saturday, December 24, 2016

A Proposal for Big Data Analytics in Health Care

Introduction
With the contemporary cloud, mobile, and streaming computing, wireless automation, Web technologies in the competitive data-driven market and Internet-based economy, data in explosion becomes ample and ubiquitous across both public and private sectors (Gartner, 2013; Information Builder, 2013). Data in health care such as PHI (Personal health information), PII (personally identifiable information), EHR (Electronic Health Records), or EMR (Electronic Medical Records) generated exponentially in a healthcare setting is never at rest. While patients see clinicians or physicians for diagnostic tests, data or information about these visits and procedures flows among healthcare providers, healthcare systems, health insurers, and healthcare networks. Other data includes enrollment information, physician credentialing, appointments, fee payment schedules, medical images, and care management documentation. The health care data in motion drives the need for taking analytics to improve healthcare services, reduce costs, enhance R & D in medicine, predict the probability of the diseases’ occurrence, and enable actionable decision-making (Information Builder, 2013; Schneiderman, Plaisant, & Hesse, 2013).
This individual project in Unit 5 (U5 IP) provides a design proposal concerning an analytics Hadoop solution that is applied to the business problem of analyzing various large data sets of PHI, PII, HER, or EMR in a hospital network system. The design proposal of the Cyber-Healthcare System gives corporate management a descriptive information and guidance of the architecture in the project that solves business problems of analyzing huge sets of scattered complex data in the healthcare industry in the North-East region, e.g., New Hampshire (NH), Massachusetts (MA), Rhode Islands (RI), and Connecticut (CT). The document also provides the related readers a generic informative overview of the proposal for big data analytics in healthcare for retrieval of insights that covers four following sections:
                        I. Network Architecture
II. NoSQL Database - Cassandra
III. Business Plan
IV. Security Policy Proposal
            Each section will describe detailed information in data science for big data analytics in health care.
I. Network Architecture
The design proposal of the Cyber-Healthcare System (The System) provides an overall description of the solution in data analytics to solve the challenges and business problems in the healthcare field.
     A. Hadoop Ecosystem
The Cyber-Healthcare System will implement and deploy the Hadoop ecosystem to hospitals in the area. As the de facto standard to manage big data, Apache Hadoop - an open source Java-based framework that uses parallel data processing across distributed clusters - is chosen for this project (Apache Software Foundation, 2014). A simplified Hadoop architecture includes four major components:
          1. Hadoop Common
The component consists of Java libraries and utilities to support other components.
          2. Hadoop YARN
The component does job schedules and manages cluster resources.     
          3. HDFS (Hadoop Distributed File System)
The HDFS provides high-throughput access to application data.
          4. Hadoop Map/Reduce
It performs Map and Reduce functions on large data sets in parallel processing to retrieve insightful health information for patients, clinics, and hospitals.

Figure 1 shows a simplified Hadoop framework with four components: YARN Frameworks, Common Utilities, HDFS, and Map/Reduce Computation.
Source: Adapted from Hadoop Software Foundation, 2012.


1.      Figure 2 displays a high-level HDFS architecture with name node and multiple data nodes in data processing.
Source: Adapted from Borthakur (Apache Hadoop Organization, 2012).
                Healthcare tools in the Cyber-Healthcare System are designed to assist health authorities in long-term plans, business strategies, and healthcare policies. The healthcare tools include diagnostic tools for monitoring, evaluating, and assessing data. Other tools are used to support priority scheduling, identify effective strategies, evaluate the cost, plan resource, calculate budget, and program and implement tasks.
     B. External interfaces
            The Cyber-Healthcare System will allow related users such as frontline healthcare providers, nurses, physicians, and administrator to enter data or view and search health information in the Hadoop ecosystem. Patients can access and view their health records only. However, administrators and designers have full privileges and authorization in options such as read/write/delete/save or change data, files.  
          1. User interface
            There is one unique graphic user interface (GUI) for four types of users.
               - The GUI with basic privilege is provided to patients who can read, view, print out individual health records, information. They can schedule appointments, send emails for questions, etc.
                - Nurses, physicians or data entry workers are provided more privileges such as to read, view health information, enter data, search or query for useful information, etc. on the GUI.
                - Administrators and designers have full privileges such as read, write, delete, change, query, extract data, etc. on the GUI.
                - Medical researchers have high-level privileges to access data and information in the System for their work.
          2. Hardware interface and software interface
            The Cyber-Healthcare System with a backbone of Hadoop environment supports NoSQL databases, aggregate data models, and key-value databases to perform the map-reduce computing and store the results of the mappers and the reducers in the materialized views with high fault-tolerance. Users can use industry standard formats like XML, JSON, texts on complex data – semi-unstructured data.
            The hardware interface comprises personal computers, desktops, laptops, Smartphones, iPhones, iPads, etc. (Natarajan, 2012).
            Software interface includes:
               a. Platforms:
                    - OS Windows 7, 8, 8.1, 10 (32-bit, 64-bit)
                                       Windows Server 2008 (64-bit)
                                       Windows Server 2012 (64-bit)
                                       Windows Server 2012 R2 (64-bit)
                                       Windows Vista SP1 and later (32-bit and 64-bit)
                     - Mac OS X hosts (64-bit)
                                        Mavericks: 10.9
                                        Yosemite: 10.10
                                        EI Capitan: 10.11
                     - Linux hosts (32-bit or 64-bit)
                                        Ubuntu 10.04 to 16.04
                                        Debian GNU/Linux 6.0 (“Squeeze”) and 8.0 (“Jessie”)
                                        Oracle Enterprise Linux 5, Oracle Linux 6 and 7
                                        Redhat Enterprise Linux 5, 6 and 7
                                        Fedora Core / Fedora 6 to 24
                                        Gentoo Linux
                                        openSUSE 11.4 to 13.2
                     - Solaris hosts (64-bit only)
                                        Solaris 11
                                        Solaris 10 (U10 and higher)                                      
               b. Emulated hardware
                     - Input devices: Standard PS/2 keyboards and mouse
                     - Graphics: Standard VGA devices
                     - Storage: Intel PIIX3/PIIX4 chips, the SATA (AHCI) interface, and two SCSI adapters (LSI Logic and BusLogic)
                     - Networking: Linux kernels version 2.6.25 or later
                                            Windows 2000, XP and Vista, drivers
                     - USB: xHCI, EHCI, and OHCI
     C. Data flow diagrams
            Healthcare data is processed dually in traditional databases in data warehouse and advanced Hadoop subsystem in ETL (Extract, Transform, and Load) process in parallelism as shown in Figures 3 and 4 below:

            Figure 3: Process flow of the large healthcare data in Map/Reduce functions in Hadoop subsystem.
Source: Adapted from Intel, 2016.


            Figure 4: Data flow in both traditional data warehouse and Hadoop subsystem in parallelism in the Cyber-Healthcare System. 
Source: Adapted from Intel, 2016.

            In the Cyber-Healthcare System, a XML data flow document can be written in XML format. For example, a typical XML design document sample is programmed as follows:
     <?xml verson=”1.0”?>
     <!—File name: TheCyberHealthcareSystem.xml -->     
          <Group>
               <Groupname>Arizona</Groupnames>
                    <Hospital>XXX</Hospital>
                         <DeptInternalMedicine>AAA</DeptInternalMedicine>
                         ….
                         <DeptIntensiveCare>BBB</DeptIntensiveCare>
                         …..
                         <DeptFamilyCare>BBB</DeptFamilyCare>
                         ……..
                    ….
                    <Clinic>YYY</Clinic>
                    …..
                    <Nursinghome>ZZZ</ Nursinghome>
                    ……
               <Groupname>Colorado</Groupnames>
                    <Hospital>III</Hospital>
                         <DeptInternalMedicine>OOO</DeptInternalMedicine>
                         ….
                         <DeptIntensiveCare>PPP</DeptIntensiveCare>
                         …..
                         <DeptFamilyCare>QQQ</DeptFamilyCare>
                         ……..
                    ….
                    <Clinic>JJJ</Clinic>
                    …..
                    <Nursinghome>KKK</ Nursinghome>
               …..  
          </Group>

     <Patient>
          <Patientname>StevenConte</Patientname>
               <PatientID>7742661926</PatientID>
               <PatientDOB>04301975</PatientDOB>
               <PatientAddress>XYZ</PatientAddress>
               <PatientOccupation>zyx</ PatientOccupation>
               <PatientAge>46</PatientAge>
               <PatientHeight>5ft8Inch</PatientHeight>
               <PatientWeight>150</PatientWeight>
               <PatientHeight>5ft8Inch</PatientHeight>
               <PatientIllness>Vertigo</ PatientIllness>
               …………
     </Patient>
     ………..
     D. Overall system diagrams
            The overall Cyber-Healthcare System based upon a Hadoop ecosystem consists of a Hadoop YARN, Common Utilities Unit, HDFS, Hadoop Map/Reduce, and NoSQL Database Cassandra. Hadoop YARN is a communication and control unit that provides job scheduling and cluster resource management. Common Utilities Unit is a supportive unit to provide libraries and utilities. HDFS provides accessing to health data sets. Hadoop MapReduce applies parallel processing on the healthcare large data sets effectively. The large data sets in Tetra Bytes are broken into 64 or 128 MB and stored in multiple low- cost commodity nodes in HDFS for Map and Reduce functions to retrieve insightful information for end-users such as patients, frontline care providers (e.g., nurses, physicians, healthcare technologists, etc.). The column-oriented database Cassandra stores healthcare data fed to the HDFS for data processing.

            Figure 5 shows the overall central Hadoop System with the control unit Hadoop YARN, supportive unit Common Utilities, and Cassandra in the Cyber-Healthcare System.
Source: Created by ThienSiLe (TSL), December 2016
     E. Communication flow chart
In communication, the Cyber-Healthcare System consists of the central Hadoop ecosystem that connects to four groups, i.e., New Hampshire, Massachusetts, Rhode Islands, and Connecticut in star configuration as shown in Figure 6 below. Each group is linked to local hospitals, outpatient clinics, nursing homes, and rehabilitation centers. Each organization has many departments. Each department has its own care team or frontline care providers that include physicians, nurses, and family members who provide health care services to patients. Also, the environment group that comprises regulators, Medicare, Medicaid, insurance companies, healthcare purchasers, and research funders can communicate with institutions such as hospitals, clinics, nursing homes, rehabilitation centers, etc.
Figure 6 depicts a high-level communication flow chart among agencies in the Cyber-Healthcare System.
Source: Created by TSL, December 2016

II. Database
The Cyber-Healthcare System can take advantage of using Hadoop HDFS as an inexpensive option for storing big data of healthcare. However, because of the colossal data sets including patient medical activities, clinical health information like electronic health records, public health data in the North-East region, the column-oriented database Cassandra is more suitable for storage database in the Hadoop Ecosystem. Notice that Hadoop HDFS is still a distributed storage for data processing in the system.
     A. Health data
            Health data is a part of big data - the broad and complex data sets - that include some structured and unstructured data. Healthcare data can be individual health records, health information, patient-related blogs, mobile data, PDF files, web log data, forums, website content, spreadsheets, photos, clickstream data, RSS feeds, word processing docs, medical scanning images, videos, audio files,  RFID tags, social media data, XML patient data, call center transcripts, etc. The sources of the healthcare data include electronic patient record system, the Internet, search engines, online social media networks like Facebook, Twitter, medical information exchanged posted by millions of people, e.g., patients, healthcare providers, disease researchers, etc. in hospitals, clinics, nursing homes, rehabilitation centers across the globe. Data sets can be replicated and shared among nodes in the scalable distributed clusters of the Cassandra database system.
Insightful information that is extracted from healthcare data can be categorized into three categories (Schneiderman, Plaisant, & Hesse, 2013):
          1. Personal health information:
Physicians and patients collect information about their practice and own health habits.
          2. Clinical health information
Electronic health records systems can enhance a health care or cure to patients and useful insights into pragmatic patterns of treatment.
          3. Public health information
A large quantity of public health data is collected to assist policy makers in more reliable decisions.
     B. Database Framework
            Cassandra is a non-relational data storage system that does not require a relation schema, joins concept with some level of tolerance to ACID (Atomicity, Consistency, Isolation, and Durability) properties (Sadalage, & Fowler, 2012). Cassandra can handle big data, particularly healthcare data, in massive volume, various forms, and fast processing for the useful insights. Cassandra that is a contemporary column-oriented database is a NoSQL big-table-style data model of the two-level aggregate structure. The first level value is a row identifier to form a map of more detailed values. The second level values are columns that contain column keys and data values. An aggregate is a collection of data that user can interact as a unit. Data in the column-oriented database is structured in the row-oriented and column oriented way that allows users to treat the aggregate as units of data within a row aggregate.
Based on personal health information such as patient medical activities, clinical health information like electronic health records, public health data stored in column-oriented database Cassandra, healthcare data and data analytics holds the promise to improve the quality of healthcare delivery, and contains the potential to enhance patient care, save lives, and lower the treatment cost. Big data analytics (BDA) on data in healthcare in Cassandra plays a crucial role in the improvement of the quality of healthcare, patient treatment, and disease prevention. The promise of data analytics and big data in healthcare includes supporting a wide range of medical and healthcare activities in physicians’ offices, hospital networks, outpatient clinics, etc. The potential consists of improving patient care, saving lives at lower cost in many advantages of using big data to healthcare in clinical operations, research and development, public health, evidence-based medicine, genomic analytics, pre-adjudication fraud analysis, patient profile analytics, etc.
The typical findings in healthcare data include the promise and potential of the BDA in healthcare in Cassandra as shown below:
          a. The promise of supporting a wide range of medical and healthcare functions. 
          b. The potential to improve the quality of healthcare delivery at lower cost (Raghupathi, & Raghupathi, 2014).
Cassandra provides a lot of advantages to the System:
               - Since it uses an open-source technique, it is cheap and easy to implement.  
               - Data is replicated to multiple nodes and can be partitioned.
               - It is easy to distribute in the network,
               - It does not require a schema.
               - It can scale up and down.
               - The data consistency requirement (CAP) is relaxed.
     C. Cassandra is a good fit 
The column-oriented database Cassandra is a good fit for the Cyber-Healthcare System (Sadalage, & Fowler, 2012; Upadrasta, & Chungath, 2014):
(1) Event logging
Cassandra with its ability to store any data structures is a good choice to store patient records, patient medical activities, doctors’ visits, etc. All events can be written in columns and row key of the form patient ID. Since it can be scaled the writes, Cassandra becomes ideally for recording patients’ activities or such events.
 (2) Content management
Column families in Cassandra consist of many columns for data entries with tags, categories, links, trackbacks, etc. in different columns. Comments can be either stored in the same row or moved to different key spaces. Illness or diseases can be put into different column families (Confino, 2010).
 (3) Counters
In the analyses of healthcare data, healthcare providers usually need to count and categorize patients for calculating analytics or statistics. Cassandra provides CounterColumnType command during the creation of a column family.
 (4) Expiring usage
Frontline healthcare providers such as physicians, nurses, technologists, etc. may want to show patient records, prediction of treatment, prevention cure, etc. They can do that by using expiring columns. Cassandra allows displaying information in certain given time then the information is deleted automatically. The time is known as TTL (Time To Live) and is defined in seconds. The column will be deleted after the TTL has elapsed. 
 (5) When Cassandra is not to use
Cassandra should not be used for Writes and Reads that require ACID transactions due to possible consistency failure. Cassandra cannot handle early prototypes or initial tech spikes because early stage may require columns change. The cost may be higher in Cassandra for query change as compared to schema change in traditional relational databases.
III. Business Plan
The dynamic and energetic world has constantly changed and intertwined rapidly with full uncertainty and chaos. It is almost impossible to predict the different future from the known present. In an aggressively competitive healthcare business environment, many organizations realize that innovation of the existing systems with the interaction between humans and technology in the Cyber-Healthcare System is important in healthcare services and medicine research and development (R & D). The business plan of the Cyber-Healthcare System that uses big data analytics in healthcare is provided as follows (VA/VHA Guidebook, 2011; Bplans, 2016):
     1. Executive summary 
            In the efforts to take analytics to improve healthcare services, reduce costs, enhance Medicine R & D, the proposal of Cyber-Healthcare System is provided to hospitals, outpatient clinics, nursing homes, and rehabilitation centers in four states (New Hampshire, Massachusetts, Rhode Islands, and Connecticut) for family medicine practice, teaching and research. It focuses on diagnosis and treatment to the patients at all ages. The system also emphasizes preventive medicine, wellness, and health of the patients.     
     2. Description
            Proposed establishment of the Cyber-Healthcare System utilizes big data analytics in health care in the North-East region to give corporate management a descriptive information and guidance of the architecture and framework. The project solves business problems of analyzing huge sets of scattered complex data, particularly healthcare services in the healthcare industry.
     3. Objectives
          a. Creating a new health care system that consolidates the obsolete and wasteful systems in North-East region.
          b. Providing high-quality healthcare to residents in the area.
          c. Innovating a medical practice that will exceed patients’ needs and expectations.
          d. Increase the number of patients by 20% each year with better services and references.
          e. Providing friendly GUIs (Graphic User Interfaces) for patients, frontline healthcare providers, and professional researchers to access information and data (e.g., PHI, PII, EHR, or EMR) appropriately from the designated websites.
          f. Reducing doctors’ appointments from 40% down to 5%.
          g. Increase the number of average visits by 20% in each state.
     4. Background
            Research data shows that most of the well-informed, health-conscious healthcare organizations can reduce the costs, decrease absenteeism, and increase productivity. In the growing use of big data generation and analysis, a giant IBM is a leader in application and integration of BDA in many fields, particularly in healthcare. IBM Watson employs BDA in data generation with the cognitive computing platform. It can connect massive dynamic and complex text data among patient records and medical literature to create hypotheses among hundred variables for treatment recommendations. Watson uses big data analytics to process the data for specific patients whose health records, historical illness, and physicians’ studies are examined carefully. The solution is deployed at many academic medical clinics and hospitals such as Cleveland Clinic, Mayo Clinic. Some institutions are partners with IBM Watson in an ecosystem to provide effective treatments at lower cost (Power, 2015). Another example is USAA, the financial services provider that uses IBM Watson to assist about 15,000 members to adjust their life from military to civilian yearly. USAA and IBM Watson use a software tool, Enhanced Virtual Assistant, or EVA, to execute transactions such as transferring money, paying bills, communications from BDA’s data generation and analysis in their 140 products.    
     5. Organizational Assessment
            The current hospitals have supported patient health promotion, but the programs have failed to improve the health outcomes and revenue. The recent survey of about 1000 patients and EHRs indicates that patients’ wellness and health are failing (VA/VHA Guidebook, 2011).
          a. 56% of the patients have BMI (body mass index) over 25; 23% of patients are obsessed (BMI > 30).
          b. 25% of patients report that they use of tobacco.
          c. 67% of patients report that they are under stress in the workplace.
          d. 95% of patients report that they want a wellness program.
          e. The average claim of $442 is reported on back injuries.
            Notice that with the current national health trends, the percentage of patients’ overweight and obesity will continue to go up with more compensation claims.
          6. Proposed Services
            Healthcare organizations that participate in the Cyber-Healthcare System will focus on a comprehensive patient health promotion program to prevent the cost rise, enhance the wellness and health of the patients:
               - Detecting the diseases at earlier stages.
               - Managing individual and population health efficiently.
               - Detecting healthcare fraud more quickly.
               - Estimating a large amount of historical data such as length of stay, choosing elective surgery, no benefit from surgery, etc. 
               - Patients at risk for medical complication.
               - Patients at risk for advancement in disease stages.
               - Causal factors of illness progression.
               - Pinpointing patients who are the greatest consumers of health resources.
               - Providing patients with the information for making informed decisions.
               - Managing patients’ own health.
               - Tracking healthier behaviors.
               - Identifying treatment.
               - Reducing re-admissions by lifestyle factors that increase a risk of the adverse event.
               - Improving outcomes by examining vitals from at-home health monitors.
               - Managing population health by detecting vulnerabilities within the patient population during disease outbreaks.
          7. Target Market Analysis
            With the up-to-date number of patients in four states (NH, MA, RI, and CT), 57 % are female. The median age is 50 with the age distribution: (1) 400,000 patients between 20 and 29, (2) 800,000 patients between 30 and 39, (3) 800,000 patients between 40 and 49, (4) 1,600,000 patients between 50 and 59, and (5) 600,000 patients at 60+ (VA/VHA Guidebook, 2011).
            The Patient Health Promotion Program (PHPP) from the Cyber-Healthcare System is open to all patients in four states on a voluntary basis. The annual assessment will be performed on each population group for specific services with unique health needs.
          8. Marketing Plan
            New Patient Orientation, State Human Resources, and Occupational State Health will conduct awareness classes to target patients, distribute information materials such as booklets, bulletins, etc. to all of them through email broadcasts, signage, newspapers, and fliers. A Website with an events calendar is available to the patients.    
          9. Resources
            Resources for implementing the Cyber-Healthcare System including hardware, software, computers, workstations, facilities, administrators, and technical support groups are operated and controlled by the State Public Health Offices and Board of Trustees. Additional resources are available in annual assessments. In-kind resources such as Reproduction, Information Resources, Medical Media to support the PHPP program are anticipated.    
IV. Security Policy Proposal
            The Cyber-Healthcare System complies with all regulations, policies, and governance for the medical healthcare industry in its design as follows:
     A. Regulations
In a practical view, market research and ethics in healthcare data based on Internet technology are usually at odds with each other. Big Data Analytics (BDA) presents both technical and strategic capabilities to generate value from the data they store for the organizations. With the blossom of BI (Business Intelligence) and BDA, there will be more security violation and privacy issues (Quora, 2014). There is a prominent risk of violation of the personal privacy. For example, terrorists likely hack healthcare systems such as the Cyber-Healthcare System to sabotage the system, harm people and take advantages for their own ideology, politics, or religion.
Big Data’s vital role in advancing medical purpose can be hindered by HIPAA (the US Health Insurance Portability and Accountability Act), Security, Privacy, and Breach Notification Rules that regulate medical information. These laws, rules, guidelines have restricted and governed the disclosure, security, collection, maintenance, transmission of electronic PHI or PHI used by healthcare providers, health insurances, or medical R & D groups. The PHI may include social security number, driver’s license number, account number, photographs, credit or debit card number, required security code, access code, password, medical information, health insurance information, username, security questions, etc. HIPAA also requires a covered entity to provide notice of privacy practices on the subjects. US Government has enforcement authority. Notice that HIPAA does not apply to health information that is not personally identifiable such as aggregate data in NoSQL databases. It also does not apply to health information used individuals or entities that are not covered in the definitions of covered entities in the HIPAA Act (Practical Law, 2016). Notice that many countries have their own healthcare laws but HIPAA appears to be one of the best in healthcare data governance in the US.
The System considers the issues (data privacy, security breaches) seriously and uses the latest antivirus software, firewall, etc. to protect the integrity of data and safeguard patients’ information. The System comply the government’s controversial in-depth regulations and obeys all medical rules. Notice that the Cyber-Healthcare System will work to obtain International Organization for Standardization ISO 9001 Certification in the healthcare industry (Nolan, 2015).
     B. Privacy Policies
            Information about users’ usages of the website is collected by using a tracking cookie, and server access logs. The collected information includes the following:
          a. The IP address from which user accesses the website.
          b. The type of operating system (OS) and browser user uses to access the System site.
         c. The date and time user accesses the Cyber-Healthcare System site.
         d. The html pages users visit.
         e. The pages address from where user followed a link to the System site.
            Some of the information is gathered by using a tracking cookie set by the Hadoop Analytics or Google Analytics service in the privacy policy. Users may refer the browser documentation for instructions on how to disable the cookie if they do not want to share the data with Hadoop or Google.
The Cyber-Healthcare System respects the patients’ rights. Patients have the following typical rights:
            - The right to receive notice of privacy practices from healthcare providers.
            - The right to see their protected health information and receive a copy.
            - The right to request changes to their records to correct errors or add information.
            - The right to have a list of PHI.
            - The right to request confidential communication.
            - The right to complain.
            The Cyber-Healthcare System gathers information to make the website more useful and friendly to visitors and better understanding how and when the website is surfed. The Cyber-Healthcare System does not collect or track personally identifiable information, or associate gathered data with any personally identifying information from the other sources.
            By using this website, user consents to the collection of this data in the manner and for the purpose to solve the challenges and business problems in the healthcare field (Apache Software Foundation, 2014; Natarajan, 2012).
     C. Governance
HIPAA is the federal Health Insurance Portability and Accountability Act of 1996 in Tennessee. It was designed to safeguard healthcare information, assist people to retain health insurance, and facilitate administrative costs’ control in the healthcare industry (HIPAA, 1996). On the privacy issue, HIPAA emphasizes on protection and maintenance of personal health information in all health-related organizations. HIPAA requires (1) frontline providers (e.g., physicians, nurses, etc.), (2) medical producers (e.g., pharmaceutical, medical device companies, etc.), and (3) payers (e.g., insurance companies) must comply all the law and rules in governance. 
            The Cyber-Healthcare System comply all HIPAA governance rules.
    D. Assumptions and limitations
            The Cyber-Healthcare System is developed and designed based on the following assumptions and limitations:
          1. Assumptions (Flower, 1999):
               - The System’s clients are patients.
               - The System’s contact with patients is high intensity, low touch.
               - Doctors are independent carriers of information and judgment.
               - Healthcare is event-driven.
               - Much of ill health will be predictable and preventable. 
               - Patients will be partners in managing their health.
               - Data in the System’s centralized repository is assumed clean, reliable, and credible.
               - All institutions such as hospitals, clinics, nursing homes, etc. use the same platform to access, view, query, and enter the large data sets in the centralized repository.
               - All frontline care providers in the care team are trained to use the System properly and professionally.
               - The System keeps all sources of time visible to the guest synchronized to a single time source, the monotonic host time.
          2. Limitations (Hortonworks, 2016)
               - Some experimental features are beta (labeled as experimental). Such beta features are provided but are not formally supported. However, users’ suggestions and feedback are welcome.
               - Poor performance with 32-bit AMD CPUs may affect Windows and Solaris platforms.
                - Poor performance with 3-bit Intel CPU model affects mainly on Windows, Solaris, and Linux kernel.
               - NX (no excuse, data execution prevention) only works for 64-bit OS computers
               - Windows XP has slower transmission rates because it supports segmentation offloading.
               - Shared folders are not supported on the OS/2 computers.
     E. Justification
            The Cyber-Healthcare System is a modern state-of-the-art system in the contemporary network of hospitals, outpatient clinics, nursing homes, and rehabilitation centers in the North-East region. The System is developed to eliminate isolation among hospitals, reduce inefficiency in care management, and prevent a loss of opportunities for advancing patient treatments. The Cyber-Healthcare System is designed with the following justifications:   
          1. Centralizing the scattered sources of colossal data sets from many agencies, various hospitals, and clinics.    
          2. Transforming unreliable huge data sets with duplication and redundancy in data and information to credible and reliable data sets.
          3. Establishing a large healthcare network system in the region to allow users such as patients and frontline care providers (physicians, nurses, family members) with the different privilege to access, view, search information that is needed or required for patient treatments, and cures at low cost possible.
          4. The Cyber-Healthcare System is implemented in Hadoop environment as described in Section I. Network Architecture above. 
          5. The System’s architecture is developed based on four target elements:
               a. Patients.
               b. Care team consists of physicians, nurses, family members.
               c. Organization includes infrastructures, resources such as hospitals, clinics, nursing homes and rehabilitation centers.
               d. The environment comprises regulation, policy, and market like regulators, Medicare, Medicaid, insurance companies, healthcare purchaser, research funders, etc.
          6. The System is designed to tackle the huge data sets’ challenges in the healthcare industry. Some healthcare data challenges are:
               a. Capturing data is difficult.
               b. Curation is not easy.
               c. Storage requires huge memory, disks.
               d. Sharing data is complicated. 
               e. Transfer data take a lot of time because of huge size. 
               f. Analysis of data requires advanced analytical tools.
               g. The presentation is sophisticated.
          7. Organizations in the System can provide better and high-quality services based on historical data from previous medical records of patients.
          8. The System has data visualization feature for users to access (Schneiderman, Plaisant, & Hesse, 2013):
               - Personal health information
               - Clinical health information
               - Public health information
Conclusion
In summary, this U5 IP document provided a design proposal of the Cyber-Healthcare System in the North-East region that used big data analytics in health care to cover four sections: (1) network architecture, (2) database Cassandra, (3) business plan, and (4) security policy proposal as follows:
     I. Network Architecture:
            The framework of using big data analytics on complex healthcare data was an Hadoop ecosystem with primary components, e.g., Hadoop Common, Hadoop YARN, Hadoop HDFS, and Hadoop Map/Reduce, and the database Cassandra. The external interfaces, data flow diagrams, communication flow chart, and overall system diagram were described in details.  
     II. NoSQL Database
            Cassandra played a critical foundation of storage of big healthcare data that included PHI, PII, EHR, HER. It is a good fit in the Cyber-Healthcare System where the Hadoop Distributed File Subsystem is in the loop of data processing.
     III. Business Plan
            The business plan addressed the executive summary, description, objectives, background, organizational assessment, target market analysis, market plan, and resources of the Cyber-Healthcare System.
     IV. Security Policy Proposal
            The proposal of security policy covered practice regulations, privacy policy, data governance, justifications, and assumptions and limitations. The Cyber-Healthcare complies with HIPAA and expects to pass ISO 9001 Certification in the healthcare industry (Nolan, 2015).
 The virtual Cyber-Healthcare System is a proposal of Hadoop framework to apply big data analytics to huge data sets in health care to improve healthcare services, reduce costs, enhance R & D in medicine, predict the probability of the diseases’ occurrence, and enable actionable decision-making. Notice that the design of four categories (network architecture, databases, business plan, and security policy proposal) in the System exceeds the requirement of three out of five given categories.


REFERENCES
Apache  Software Foundation (2014). What is apache hadoop?  Retrieved November 08, 2015 from http://hadoop.apache.org/

Bplans (2016). Free medical and health care sample business plans. Retrieved December 06, 2016 from http://www.bplans.com/medical_and_health_care_business_plan_templates.php

Confino, J. (2010). Why you need nosql in your toolbox. Retrieved October 17, 2016 from http://chariotsolutions.blogspot.com/2010/01/why-you-need-nosql-in-your-toolbox.html.

Flower, J. (1999). The revolution in our assumptions about healthcare. Retrieved
September 08, 2016 from http://www.well.com/~bbear/assumptions.html

Gartner Group (2013). Gartner predicts business intelligence and analytics will remain a top focus for CIOs through 2017. Press Release. Las Vegas, NV. Retrieved June 4, 2015 from http://www.gartner.com/newsroom/id/2637615.

HIPAA Act, (1996). The federal health insurance portability and accountability act. Retrieved October 19, 2015 from http://tn.gov/health/topic/hipaa.

Information Builder, (2013). Data in motion – big data analytics and healthcare. Retrieved December 04, 2016 from
http://docs.media.bitpipe.com/io_10x/io_109369/item_674791/data%20in%20motion%20big%20data%20analytics.pdf

Natarajan, R. (2012). Apache Hadoop Fundamentals – HDFS and MapReduce Explained with a Diagram. Retrieved November 01, 2015 from http://www.thegeekstuff.com/2012/01/hadoop-hdfs-mapreduce-intro/

Nolan, J. (2015). Would hospitals benefit from ISO 9001? Retrieved September 08, 2016
from http://advisera.com/9001academy/blog/2015/07/21/would-hospitals-benefit-from-iso-9001/

Power, B. (2015). Artificial intelligence is almost for business. Retrieved August 14, 2016 from https://hbr.org/2015/03/artificial-intelligence-is-almost-ready-for-business

Practical Law (2016). PLC - Data protection in the united states: overview. Retrieved November 21, 2016 from http://us.practicallaw.com/6-502-0467

Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education.

Schneiderman, B., Plaisant, C., & Hesse, B. (2013). Improving healthcare with  interactive visualization methods. Retrieved September 06, 2016 from https://www.cs.umd.edu/~ben/papers/Shneiderman2013Improving.pdf

Upadrasta, B.,  & Chungath A. (2014). “NoSQL, NewSQL, or RDBMS: How To Choose”. Retrieved October 17, 2016 from
http://www.informationweek.com/big-data/big-data-analytics/nosql-newsql-or-rdbms-how-to-choose/a/d-id/1297861

VA/VHA Employee Health Promotion Disease Prevention Guidebook (2011). Sample business plan – public health. Retrieved December 06, 2016 from www.publichealth.va.gov/docs/employeehealth/12-Sample-Business.pdf


























Thursday, December 15, 2016



A Policy Proposal on Data Breach
by TSL
Colorado Technical University
CS881-1604C-01
Professor: Dr. James Webb
26-November-2016


Introduction

With contemporary cloud and mobile computing, automation, Web technologies in the competitive data-driven market and Internet-based economy, data at low storage cost and fast processing explode and become ubiquitous and ample in both public and private sectors. Big Data, a generic paradigm for data in 5V’s (massive Volume, Variety in forms, high Velocity in processing, truthful Veracity, and real-time Value) pose major challenges of extracting or transforming for insightful information to the organizational decision (Gartner, 2013). Big Data may be categorized into three types: (1) structured data, e.g., relational data, (2) semi-structured data, e.g., data in XML, JSON formats, and (3) unstructured data such as word, pdf, text, media blogs, streaming data, etc. Perhaps, healthcare data may be one of the most unstructured forms for big data analytics (BDA). The individual project in Unit 4 (U4 IP) will discuss Big Data in healthcare, healthcare issues such as fraud, data breach, and privacy. The BDA applications in healthcare will be addressed. This U4 IP will present a policy proposal that covers consequences of data breach and the importance of data privacy in the following sections:            
I. Healthcare Data
II. Issues in Healthcare
     A. Fraud
     B. Data breach
     C. Data privacy
     D. Regulatory laws
III. BDA Applications in Healthcare
     A. Big Data Analytics
     B. BDA on fraud and breach
     C. BDA on data privacy protection
IV. Policy Proposal
     1.0 Purpose
     2.0 Scope
     3.0 Legislation
     4.0 Consequences of data breaches
     5.0 Importance of data privacy
     6.0 Policy
     7.0 Breach management plan
                                 7.1 Identification and Classification
                                 7.2 Containment and Recovery
                                 7.3 Risk assessment
                                 7.4 Notification of Breaches
          7.5 Evaluation and Response
      8.0 Roles and responsibilities
          8.1 Line managers
          8.2 Individual users
      9.0 Enforcement
      10. Review and update
I. Healthcare Data
In general, the healthcare industry generates a vast amount of data, driven by patient care, personal health records, clinical health systems, compliance and regulatory requirements, and public health information (Raghupathi & Raghupathi, 2014). Recently, broad and complex data sets in healthcare that include some structured and unstructured data are often stored in NoSQL databases such as Cassandra, Amazon DynamoDB, or MongoDB. They can be replicated and shared among many nodes and servers in the scalable distributed clusters. A backup system will provide a safeguard and recovery if some disasters such as hacking or risks of the loss happen. Many medical organizations amass and analyze the huge amount of medical data, protected health information (PHI) or personally identifiable information (PII). PHI or PII that can be extracted by typically advanced analytics tools like Hadoop, R Project, or Tableau is divided into three categories (Schneiderman, Plaisant, & Hesse, 2013):
          1. Personal health information:
Physicians and patients collect information about their practice and own health habits respectively.
          2. Clinical health information
Electronic health records systems can enhance a health care or cure to patients and useful insights into pragmatic patterns of treatment.
          3. Public health information
A large quantity of PHI/PII is collected to assist policy makers in more reliable decisions.
II. Issues in Healthcare
Healthcare data is believed to contain hidden insightful information that is valuable for enhancing cure and treatment to patients, enriching physicians’ skill and healthcare systems. For example, protein therapeutics data, clinical trial data, genetics and genetic mutation data, protein therapeutics data, etc. can be harvested to improve daily healthcare processes (Hurwitz, Nugent, Halper, Kaufman, 2016). There are many advanced tools for data analytics in industry, particularly in health services for diagnosis of illness (EMC Education Services, 2015).
     A. Fraud
Recently, health care fraud continues increasing significantly, particularly on Affordable Care Act. HHS (US Department of Health and Human Services) reported civil and criminal charges to 301 professional care givers and doctors in $900-million health care fraud schemes of false claims (Office of Public Affairs, 2016). HHS in its news release June 22, 2016, addressed various health care fraud-related crimes from the conspiracy of health care fraud identity theft, money laundering and violations in Medicare, Medicaid, etc. Health Net Federal Services, LLC (HNFS) defines fraud in health care as a misrepresentation of fact or intentional deception to obtain unauthorized payment or health services. For examples, a care provider submits the claims for services that it never delivers or bills for services at a higher price to the health insurance companies. Abuse in health care is the action that is inappropriate beyond the acceptable standard of conduct such as failure in maintenance of medical or financial records or refusing access to medical records (Health Net Federal Services, 2016).
     B. Data breach
     According to a Ponemon Institute, the average cost of a data breach is $3.8 million up 23% in 2013. The highest cost per stolen record is at an average of $363 in health care field. The costs of lost business from the breach have risen from $1.23 million to $1.57 million in 2013. The US and Germany have the most expensive data breaches. Those data breaches could cost the healthcare industry $6 million each year. The cost components of data breach consist of investigation, remediation, notification, identity-theft repair, regulatory fines, loss of business, class-action lawsuits. Criminal attacks are the leading cause of data breaches in healthcare (Raul, 2014). In 2015, 78% of healthcare organization breaches came from web-borne malware attacks. Notice that many healthcare organizations remain unprepared for data breaches – only 40% of healthcare organizations are concerned about cyber attacks.56% of the healthcare organizations address the lack of funding and resources for incident response to data breaches. Also, except external forces are the leading cause of data breaches, internal causes also expose some concerns (Pallardy, 2015).    
     Some data breaches occurred with substantial settlement costs as shown below:
          1. In February 2015, Anthem the second largest insurer in the US reported the largest healthcare data breach. Hackers accessed the personal information of 80 million customers and employees. The hackers obtained credentials from 5 Anthem technology workers and log in the system from the link. The cause was weak login security, and Anthem’s database was not encrypted.
          2. In 2010, the payer was fined $1,7 million for a smaller breach that includes information of 612,000 people. The payer was faced with two class-action lawsuits.
     C. Data privacy
In the healthcare field, Bug Data is a valuable asset for medical science, but it poses a potential risk to patient (Bertolucci, 2014). Many medical organizations amass and analyze the huge amount of medical data or protected health information (PHI)
Market research and ethics or data privacy in Big Data based on Internet technology are usually at odds with each other in practice. Big Data Analytics presents both technical and strategic capabilities to generate value from Big Data, particularly data in healthcare, stored for the organizations. With the blossom of BDA, BI (Business Intelligence), and recent AI (Artificial Intelligence), IoT (Internet of Things), there are more chances of violation of security and privacy (Quora, 2014). The risk of violations of the personal privacy is a prominent threat to both public and private individuals and organizations. For example, terrorist hackers likely use advanced analytics tools to access the healthcare systems illegally for their unauthorized benefits or harm people. The issues become urgent and disastrous in a massive scale. The government’s regulatory agencies and protection organizations involve, participate and enforce the law with new rules, controversial in-depth regulations.
     D. Regulatory laws
In the US, regulatory laws, rules, and practical guidance were introduced to tackle healthcare fraud, data breach, and data privacy violations. HIPAA is the federal Health Insurance Portability and Accountability Act of 1996 in Tennessee. It was designed to assist people to retain health insurance, safeguard healthcare information, and facilitate administrative costs’ control in the healthcare industry (HIPAA Act, 1996). On the privacy issue, HIPAA emphasizes on protection and maintenance of personal health information in all health-related organizations. HIPAA requires (1) healthcare providers (e.g., physicians, nurses, etc.), (2) producers (e.g., pharmaceutical, medical device companies, etc.), and (3) payers (e.g., insurance companies) must comply all the law and rules in governance.  Security, Privacy, and Breach Notification Rules regulate medical information. These laws, rules, guidelines have restricted and governed the disclosure, security, collection, maintenance, transmission of electronic PHI or PHI used by healthcare providers, health insurances, or medical R & D groups. The PHI/PII may include social security number, driver’s license number, account number, photographs, credit or debit card number, required security code, access code, password, medical information, health insurance information, username, security questions, etc.        
III. BDA Applications in Healthcare
     A. Big Data Analytics
Big Data Analytics (BDA) is a process to examine Big Data for hidden patterns, unknown correlations, market trends, customer preferences, and other useful business information (Chen, Chiang, & Storey, 2012). Today, an emergent trend of BDA becomes a popular demand in many fields: education, manufacturing, marketing, politics, healthcare, security, etc. Many companies have analytics products. Some of the typical analytics products are IBM Watson, AWS (Amazon Web Services), R Project, Tableau, etc. Notice that demand in BDA provides plentiful opportunities for employment for big data talents who possess highly analytical skills in many organizations (Sondergaard, 2015).
     B. BDA on fraud and breach
Today, the society constantly continues changing, especially in technology. Big Data Analytics become a powerful tool for data mining on the huge and complex data sets in many fields, and health care field is not the exception.
In 2012, CMS (Medicare & Medicaid Services) went further in fighting health care fraud and abuse. It used big data analytics in the twin-pillar approach to detect fraud before making payments. One of them is a fraud prevention system by utilizing the advanced analytical method in big data analytics, an extent of transforming data, particularly healthcare data, into insightful information with fast algorithms and historical data to detect fraudulent claims. The second approach is an automated program of screening providers approach to validate the eligibility of the suppliers or providers in the CMS program.
       To fight against the health care fraud, waste, and abuses, HHS and other organizations have used the following measures:
     - Increase funding to use BDA for detecting fraud and abuses.
     - Spend all recovered funds from fraud and abuse for further enforcement activities with BDA tools.
     - Prioritizing spending on fraud and abuse control activities. 
     - Increase trust of patients and the public with e-healthcare systems.
     - Use BDA to reduce conflicts of interest for providers.
     - Apply BDA to establish clinical practice guidelines and routines
     - Restrict BDA tools in industrial marketing practices.
     - Take a balanced approach to fraud and abuse control activities.
     These measures focus on human efforts among many organizations such as social security administration, CMS, HHS, hospitals, outpatient clinics, nursing homes, and rehabilitation centers.
     C. BDA on data privacy protection
            For protecting data privacy, users can adopt data analytics technology in health informatics technology to assist patients and healthcare providers to access accurately protected data including (1) clinical health information, (2) public health information for policy makers, and (3) personal health information for physicians’ practice or patients’ own health habit. Also, social networking is the most sophisticated new analytic designed to catch fraudsters who use identity theft to obtain health care services or benefit without authorization by tracking ownership of the providers (Health Policy Briefs, 2012).
According to ICC/ESOMAR (European Society for Opinion and Marketing Research) International Code, four basic ethical issues that are identified are (1) autonomous collection of data, (2) data security, (3) information ownership, and (4) privacy in research ethics on Big Data. The individual privacy becomes a primary ethical issue (Agadish, Gehrke, Labrinidis, Papakonstantinou, Patel, Ramakrishnan & Shahabi, 2014). ICC/ESOMAR specifies in the Article that privacy policy, collection data, use of data, security of processing, rights of the respondents, and transborder transactions must be considered and protected privacy appropriately
As a student in DCS (Doctor of Computer Science) Program at CTU (Colorado Technical University), this student was required to take the Basic Institutional Review Board (IRB) Course and the “Computer Science and Information Technology researchers” in CITI (Collaborative Institutional Training Initiative) Course (Alexander, 2014). Both courses also emphasize on privacy protection. For example, the Common Rule (45 CFR 46, Subpart A) in section 6 of the CITI on privacy and confidentiality requires IRBs to determine adequate provisions the privacy protection of subjects and maintenance of the confidentiality of data. Therefore, the corporations should develop a strong tradition of proactive development of ethical standards based on the first ESOMAR Code of Marketing & Research Practice being published in 1948 and the MRS publishing its first self-regulatory code in 1954. Especially in Big Data research, researchers, scientists, practitioners, professionals, and particularly this student should comply with ICC/ESOMAR International Code of Conduct, HIPPA law, and CITI/IRB requirements by carefully complying, obeying and following these guidelines during their practices (Voosen, 2015).
IV. Policy Proposal
            This policy proposal for healthcare organizations is designed to explain the consequences of data breaches on individuals and organizations. It emphasizes the importance of using big data analytics with security in mind. It also covers the importance of data privacy as well as the steps that the organizational staff should comply with data privacy rules, HIPAA and HHS (US Department of Health and Human Services).


     1.0 Purpose
This policy proposal is legally required and compliant with the HIPAA, HHS, and other rules, guidelines and regulations for safeguarding individual health data against violations of severe penalties and restrictions. Data in healthcare such as PHI (protected health information) is a valuable organizational asset that requires identifying, managing, sharing and protecting for individual patients, health care providers and institutions, e.g., hospitals, outpatient clinics, nursing homes, and rehabilitation centers. A data or information breach or inconsistent data security may occur due to illegal access by unauthorized persons, groups, or lost due to natural disasters such a fire, flood, or stolen because of the cyber attack, or the theft of a mobile devices, e.g., smartphones, laptop computers (HSE, 2011; HHS, 2008).       
     The purpose of this policy proposal is to ensure that the standardized management approach in place in the event of the data breach. This policy proposal is mandatory to all users who access PHI or healthcare information with an agreement to abide all terms and conditions stated in this policy proposal.
     2.0 Scope
            This policy proposal addresses the responsibilities, obligations, and duties that individuals, staff, service providers, contractors, third parties, and related organizations that access, use, store or process PHI in the healthcare system need to follow and comply with their practice on their daily work. The policy proposal must be approved and authorized by senior management of the organization.
     3.0 Legislation
            The policy proposal on data breach, data privacy, security and governance that regulate PHI/PII is based on the following regulations, laws, rules and guidelines as shown below (Practical Law, 2016):
          - The HIPAA (the US Health Insurance Portability and Accountability Act) 
          - FTC Act (The Federal Trade Commission Act)
          - The Financial Services Modernization Act (Gramm-Leach-Bliley Act (GLB))
          - The HIPAA Omnibus Rule
          - The Security Breach Notification Rule
          - The Fair Credit Reporting Act
          - The Controlling the Assault of Non-Solicited Pornography and Marketing Act
          - The Electronic Communications Privacy Act
          - The Federal Communications Commission (FCC)
          - The Judicial Redress Act
          - The federal security and law enforcement laws
          - State privacy laws:
                        - Enacted the California Electronic Communications Privacy Act
- Enacted several amendments to security breach notification law
                        - Enacted A.B. 1541, etc.
     4.0 Consequences of the data breaches
Data breaches lead to criminal and civil charges against 301 individuals, including 61 doctors, nurses and other licensed medical professionals, for their alleged participation in health care fraud and breaches involving approximately $900 million in false billings.
The individuals would get laid off for data violation. The staff would lose their jobs. The companies pay hefty fines, lose business, and public image to clients (  ).
The entities such as individual, staff, healthcare providers, hospitals, clinics, insurance companies that violate HIPAA Law may face hefty fines in both civil and criminal penalties.
          a. Individual who does not know HIPAA violates data privacy:
               - The minimum penalty is $100 per violation. An annual maximum fine is $25,000 for repeated violations.
               - The maximum penalty is $50,000 per violation. An annual maximum fine is $1.5 million for repeated violations.     
          b. Individual violates HIPAA due to willful neglect, but the violation is corrected within required timeframe.
               - The minimum penalty is $1,000 per violation. An annual maximum fine is $100,000 for repeated violations.
               - The maximum penalty is $50,000 per violation. An annual maximum fine is $1.5 million for repeated violations.     
          c. Individual violates HIPAA due to willful neglect but is not corrected within required timeframe.
               - The minimum penalty is $50,000 per violation. An annual maximum fine is $1.5 million for repeated violations.
               - The maximum penalty is $50,000 per violation. An annual maximum fine is $1.5 million for repeated violations.     
          d. Covered entities are clearinghouse, providers, health plans and employees. They are held liable under HIPAA. The penalty for data violation is heavy in fines and imprisonment for up to one year.
            Except the penalties for HIPAA violations, individuals such as employees, contractors who violate the rule(s), based on the degree of severity, may receive a notification warning with a black mark in the disciplinary record of the first data violation, get suspended or pay a fine for the second, or may get laid off without pension, or dismissed or terminated on the third PHI violation.     
     5.0 Importance of data privacy
Data privacy is very importance during practicing BDA on Big Data for useful insights. All users including individuals, staff, healthcare providers, contractors, third parties, organizations at all levels must be trained for awareness of data privacy.
          a. HIPAA patients’ rights
All users must understand and respect patients’ rights as shown below:
     - The right to received notice of privacy practices from healthcare providers.
     - The right to see their protected health information and receive a copy.
     - The right to request changes to their records to correct errors or add information.
     - The right to have a list of PHI/PII.
     - The right to request confidential communication.
     - The right to complain.
          a. Main obligations
            They have the primary obligations to comply all HIPAA rules as follows:
HIPAA requires the covered entities like healthcare organizations and medical professionals to (1) use, disclose and request the minimum quantity of PHI to complete a transaction; (2) implement data security protocols, security procedures and policies at technical, administrative levels to protect data under the HIPAA Privacy Rule; (3) comply with the standards set up for electronic transactions. It also requires the entities to obtain a writing consent form from data subjects. HIPAA requires the entities to provide a notice of privacy practices to data subjects, patients.
Information on the Guidance for Remote Use of and Access to Electronic Protected Health addresses the risk of accessing, storing or transferring medical data on laptop and desktop computers, home PC, wireless devices, memory flash drives, e-mail and public workstations. Sample business associate agreements are provided by Department of Health and Human Services.
HIPAA requires the covered entities like healthcare organizations and medical professionals to (1) use, disclose and request the minimum quantity of PHI to complete a transaction; (2) implement data security protocols, security procedures and policies at technical, administrative levels to protect data under the HIPAA Privacy Rule; (3) comply with the standards set up for electronic transactions. It also requires the entities to obtain a writing consent form from data subjects. HIPAA requires the entities to provide a notice of privacy practices to data subjects, patients.
Information on the Guidance for Remote Use of and Access to Electronic Protected Health addresses the risk of accessing, storing or transferring medical data on laptop and desktop computers, home PC, wireless devices, memory flash drives, e-mail and public workstations. Sample business associate agreements are provided by Department of Health and Human Services.
     6.0 Policy
In gathering and exploiting big healthcare data, most data science projects in exploratory nature pose the huge challenges. The companies often establish a process for the best practices to govern, manage and control in several phases for effectiveness and efficiency. Similarly to software or hardware development process or even proposed dissertation research process, a basic Data Analytics Lifecycle (DAL) is an analytics process designed to particularly for Big Data challenges and data science projects. According to EMC Education Services (2015), DAL consists of six phases with the project work that can occur in several phases at once. Six phases are:
a. Discovery: Learn business and determine the business problem
            b. Data Preparation: Gather data and perform ETLT (Extract, Transform, and Load or Extract, Load, and Transform) on the data.
c. Model Planning: Determine methods, techniques, and workflow, and learn the relationship between variables.
d. Model Building: Develop datasets for testing, training, and production.
e. Results Communication: Determine the results or explain the outcome.
f. Operationalization: Deliver final reports, briefings, code, etc.
            In the event of data breach, PHI/PII violation, or data privacy violation, the following breach management plan is strictly executed in five sequential stages (HHS, 2008; HSE, 2011):
            1. Identification and Classification
2. Containment and Recovery
3. Risk Assessment
4. Notification of Breach          
5. Evaluation and Response
     7.0 Breach Management Plan
     7.1 Identification and Classification
            This stage requires any staff member to report any suspicious activities or data security breach to managers. The procedure for such report must be in place for staff members. Data breach is an unintentional release of confidential or PHI to unauthorized persons or accidental disclosure, or theft of PHI/PII
     7.2 Containment and Recovery
            Containment includes the scope and impact of the data breach. If the data breach occurs, managers should (1) decide on who should investigate the breach, (2) inform which department(s) need to be aware of the problem and which measures should be used., (3) Determine how to recover the losses and limit the damage. 
     7.3 Risk assessment
            The manager should consider what would be the potential consequences for staff members and individuals. The manager should consider (1) What type of data or information is involved. (2) How sensitive the data is, (3) There are any security mechanisms such as password, protected, encryption, (4) What could the information tell a third party about the individual, and (5) How many individuals are affected by the data breach.    
     7.4 Notification of Breaches
          - All data breaches must be reported to the authority such as the Consumer Affairs or Computer Security Incident Response Center (CSIRC).
          - CSIRC should inform to other related agencies and notify the HHS Records Officer, and third parties (e.g., media outlets and public and private sector agencies)
     7.5 Evaluation and Response
            At this stage, a thorough review must be performed on the incident of data security breach to ensure that some measures must be improved in the identified areas. Any recommended change must be documented, implemented and deployed right away. Managers should identify who are responsible for reacting to the breaches of data security.
     8.0 Roles and Responsibilities      
     8.1 Line Managers
            Line managers are responsible for (1) the implementation of this policy proposal within the business area, (2) make sure that all individual, staff are instructed to comply with this policy proposal, and (3) consulting HIPAA office and CSIRC office in association with the appropriate procedures for  following up when a breach has occurred.
     8.2 Individual Users
            Each individual is responsible for (1) complying with the terms, rules of this policy, (2) respecting and protecting the confidentiality and privacy of data and information they process at all times, (3) reporting all breaches, abuse, misuse of this policy to the line manager.
     9.0 Enforcement
            The violators who break the rules or conditions of this policy will be subject to disciplinary actions. They must be denied to access organizational IT resources and may be suspended and dismissed in the disciplinary procedure. 
     10.0 Review and Update
            The policy proposal’s author reserves the right to update and revise the content of the policy proposal appropriately and frequently to ensure that any changes in structures, reorganization and business practices must be reflected in this policy proposal.
Conclusion
In summary, this document provided a brief introduction of Big Data, Big Data Analytics. It described huge data sets in health care and discussed healthcare fraud, data breach, the issue of data privacy and the current regulations. Especially, the document focused on the applications of Big Data Analytics on health care data to detect widespread healthcare fraud, fight against security breaches and use BDA to protect data privacy. It presented a policy proposal including ten sections: purpose statement, scope, legislation, consequences of the data breaches to individuals, staff and organizations, the importance of data privacy, the policy, breach management plan, roles and responsibilities, enforcement, and review and update.    

REFERENCES

Agadish, H., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., & Shahabi, C. (2014). Big data and its technical challenges. Communications Of The ACM, 57(7), 86-94. doi:10.1145/2611567

Alexander, M. (2014). What is the institutional review board (IRB) process?
Presentation presented at the Doctoral Symposium of Colorado Technical University, Englewood, CO.

Bertolucci, J. (2014). Healthcare big data debate: public good vs. privacy. Retrieved November 21, 2016 from http://www.informationweek.com/big-data/big-data-analytics/healthcare-big-data-debate-public-good-vs-privacy/d/d-id/1316367

Chen, H., Chiang, R. H., & Storey, V. C. (2012). Business intelligence and analytics: From big data to big impact. MIS quarterly, 36(4), 1165-1188.

EMC Education Services. (2015). Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. John Wiley & Sons.

Gartner Group (2013). Gartner predicts business intelligence and analytics will remain a top focus for CIOs through 2017. Press Release. Las Vegas, NV. Retrieved June 4, 2015 from http://www.gartner.com/newsroom/id/2637615.

Health Net Federal Services (2016). Our commitment to fight health care fraud and abuse. Retrieved November 14, 2916 from https://www.hnfs.com/content/hnfs/home/tn/bene/claims/what_is_fraud.html

Health Policy Briefs (2012). Eliminating fraud and abuse. Retrieved November 14, 2916 from http://www.healthaffairs.org/healthpolicybriefs/brief.php?brief_id=72

HIPAA Act, (1996). The federal health insurance portability and accountability act. Retrieved October 19, 2015 from http://tn.gov/health/topic/hipaa.

HHS (US Department of Health and Human Services), (2008). Personally identifiable information (pii) breach response team. Retrieved November 22, 2016 from http://www.hhs.gov/ocio/policy/20080001.003.html

HSE (Health Service Executive), (2011). Data protection breach management policy. Retrieved November 23, 2016 from http://www.hse.ie/eng/services/Publications/pp/ict/Data_Protection_Breach_Management_Policy.pdf

Hurwitz, J., Nugent, A., Halper, F., Kaufman M. (2016). How to incorporate big data into the diagnosis of diseases. Retrieved October 09, 2016 from http://www.dummies.com/programming/big-data/how-to-incorporate-big-data-into-the-diagnosis-of-diseases/

Office of Public Affairs, Department of Justice (2016). National health care fraud takedown results in charges against 301 individuals for approximately $900 million in false billing. Retrieved November 14. 2916 from https://www.justice.gov/opa/pr/national-health-care-fraud-takedown-results-charges-against-301-individuals-approximately-900

Pallardy, C. (2015). 50 things to know about healthcare data security & privacy. Retrieved November 21, 2016 from http://www.beckershospitalreview.com/healthcare-information-technology/50-things-to-know-about-healthcare-data-security-privacy.html

Practical Law (2016). PLC - Data protection in the united states: overview. Retrieved November 21, 2016 from http://us.practicallaw.com/6-502-0467

Quora (2014). What is the future of business intelligence?  Retrieved October 20, 2015 from http://www.quora.com/What-is-the-future-of-business-intelligence.

Raghupathi, W., &  Raghupathi. V. (2014). Big data analytics in healthcare: promise and potential. Retrieved October 09, 2016 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4341817/

Schneiderman, B., Plaisant, C., & Hesse, B. (2013). Improving healthcare with
            interactive visualization methods. Retrieved September 06, 2016 from https://www.cs.umd.edu/~ben/papers/Shneiderman2013Improving.pdf

Sondergaard, P. (2015). Gartner says big data creates big jobs. Retrieved on December 7, 2015 from http://www.gartner.com/newsroom/id/2207915

Voosen, P. (2015). After facebook fiasco, big-data researchers rethink ethics. Chronicle Of Higher Education, 61(17), A14.