Page 116 - Data Science class 10
P. 116
A database subset permits users in the field to work with a small subset of a database (maximum 50,000 records).
The database subset comprises enough information to aid investigations but its main role is to capture new
information that, on return to base, is loaded into the main database. While we define a set, if we take pieces of
that set, we can create what is referred to as a subset.
B
Example: the set {1, 2, 3, 4, 5, 6, 7}
A subset of this set is {1, 2, 3, 4}. Another subset is {3, 4, 5} or even another is {1}, 1 2 3 4 5 6 7
etc. A
But {1, 8} is not a subset, since it has an element (8) which is not in the parent set. Set
Subset
1.1.1. Notation
There is just a simple representation for sets. We simply list each element (or "member") separated by a comma,
and then put some curly brackets around the whole thing:
{ 3, 6, 91, ... }
element element three dots mean that the process
element goes on forever (infinite)
("element" or "member" mean the same thing)
The curly brackets { } are sometimes called "set brackets" or "braces".
1.1.2. Calculating the Number of Subsets
The number of subsets can be determined from the number of elements in the set. Therefore, if there are 3
elements as in this case, there are: 2 =8 subsets. Remember that the empty (or null) set and the set itself are
3
n
subsets. The number of subsets is always 2 where n is the number of elements in the set.
Suppose you have a table having 8 rows and three columns, this small table of only 8 rows and 3 columns (subset
A) is known as a “Subset” in Data Analytics. Here we have shown two subsets Subset A and subset B. The subset B
has only 3 rows and two columns and it is a subset of main set as well as subset of subset B.
Set Subset A Subset B
Col. 1 Col. 2 Col. 3 Col. 1 Col. 2 Col. 3 Col. 1 Col. 2
Row 1 Row 1 Row 1
Row 2 Row 2 Row 2
Row 3 Row 3 Row 3
Row 4 Row 4
Row 5 Row 5
Row 6
Row 7
Row 8
1.1.3. Purpose of Subsetting
Subsetting the data is a useful indexing feature for accessing object elements. It can be used for selecting and
filtering variables and observations. We subset the data from a data frame to retrieve a part of the data that we
need for a specific purpose. This helps us observe just the required set of data by filtering out unnecessary content.
For example, Subsetting allows you to work with data that contains all the necessary links between tables for your
programs to function, but for a fraction of the cost.
A subset allows you to reduce the size of your dataset, and to split the examples into disjoint sets for the purpose
of training, validation, and testing. In research communities (for example, earth sciences, astronomy, business, and
government), subsetting is the process of retrieving just the parts of large files which are of interest for a specific
purpose. This occurs usually in a client-server setting, where the extraction of the parts of interest occurs on the
server before the data is sent to the client over a network. The main purpose of subsetting is to save bandwidth
on the network and storage space on the client computer.
114 Touchpad Data Science-X

