Page 116 - Data Science class 10
P. 116

A database subset permits users in the field to work with a small subset of a database (maximum 50,000 records).
        The database subset comprises  enough information to aid investigations but its  main role is  to capture new
        information that, on return to base, is loaded into the main database. While we define a set, if we take pieces of
        that set, we can create what is referred to as a subset.
                                                                                                  B
        Example: the set {1, 2, 3, 4, 5, 6, 7}

        A subset of this set is {1, 2, 3, 4}. Another subset is {3, 4, 5} or even another is {1},   1  2  3 4  5  6  7
        etc.                                                                                 A

        But {1, 8} is not a subset, since it has an element (8) which is not in the parent set.              Set
                                                                                      Subset
        1.1.1. Notation
        There is just a simple representation for sets. We simply list each element (or "member") separated by a comma,
        and then put some curly brackets around the whole thing:

                                           { 3, 6, 91, ... }

                                      element  element     three dots mean that the process
                                                   element  goes on forever (infinite)
                                      ("element" or "member" mean the same thing)
        The curly brackets { } are sometimes called "set brackets" or "braces".

        1.1.2. Calculating the Number of Subsets

        The number of subsets can be determined from the number of elements in the set. Therefore, if there are 3
        elements as in this case, there are: 2 =8 subsets. Remember that the empty (or null) set and the set itself are
                                           3
                                                n
        subsets. The number of subsets is always 2  where n is the number of elements in the set.
        Suppose you have a table having 8 rows and three columns, this small table of only 8 rows and 3 columns (subset
        A) is known as a “Subset” in Data Analytics. Here we have shown two subsets Subset A and subset B. The subset B
        has only 3 rows and two columns and it is a subset of main set as well as subset of subset B.

                                Set                           Subset A                   Subset B
                          Col. 1 Col. 2 Col. 3            Col. 1 Col. 2 Col. 3          Col. 1 Col. 2
                   Row 1                            Row 1                        Row 1
                   Row 2                            Row 2                        Row 2
                   Row 3                            Row 3                        Row 3
                   Row 4                            Row 4
                   Row 5                            Row 5
                   Row 6
                   Row 7
                   Row 8

        1.1.3. Purpose of Subsetting

        Subsetting the data is a useful indexing feature for accessing object elements. It can be used for selecting and
        filtering variables and observations. We subset the data from a data frame to retrieve a part of the data that we
        need for a specific purpose. This helps us observe just the required set of data by filtering out unnecessary content.
        For example, Subsetting allows you to work with data that contains all the necessary links between tables for your
        programs to function, but for a fraction of the cost.

        A subset allows you to reduce the size of your dataset, and to split the examples into disjoint sets for the purpose
        of training, validation, and testing. In research communities (for example, earth sciences, astronomy, business, and
        government), subsetting is the process of retrieving just the parts of large files which are of interest for a specific
        purpose. This occurs usually in a client-server setting, where the extraction of the parts of interest occurs on the
        server before the data is sent to the client over a network. The main purpose of subsetting is to save bandwidth
        on the network and storage space on the client computer.
          114   Touchpad Data Science-X
   111   112   113   114   115   116   117   118   119   120   121