## Program to perform Linear Search

**Linear Search:**

Linear search involves searching the element from a given set of elements beginning from the first element upto the last element or upto its first occurrence if it is available in the given set. The efficiency of Linear Search is O(n). The program to perform linear search is given below:

/* Linear Search */

#include <stdio.h>

#define MAX 10

int main(){

int arr[MAX],n,i,num,val;

printf(“nEnter total numbers?”);

scanf(“%d”,&n);

for(i=0;i<n;i++){

printf(“nEnter number?”);

scanf(“%d”,&arr[i]);

}

printf(“nEnter the value to search?”);

scanf(“%d”,&val);

for(i=0;i<n;i++){

if(arr[i]==val){

printf(“nNumber found.”);

return;

}

}

printf(“nNumber not found.”);

}

## Arrays

- Linear arrays
- Memory representation/address calculation
- Operations
- Traversal
- Search
- Insertion
- Deletion

- Two Dimensional arrays

An array is a set of homogenous elements (of the same type) referenced under one name along with the subscript. The subscript is the index or the position that contains the element. In ‘C’, the subscript value begins with zero. Arrays may be one-dimensional, two-dimensional or multi-dimensional. Generally one-dimensional and two-dimensional arrays are used to represent real world entities for solving problems pertaining to a given domain. A one dimensional array is declared in “C” as follows:

int arr[10];

The above declaration declares an integer array of 10 integers referred to by the name arr. Similarly two dimensional arrays may be declared:

int arr[5][6];

The above declaration declares a two dimensional array, arr of 5 rows and 6 columns. Total integers that may be stored in the above array are 5 x 6 = 30. An array may be represented as an Abstract Data Type.

The address of the first element of the array is called its base address. Arrays are by default passed to functions by reference using this base address. Once the base address is available, remaining elements of the array may be accessed by manipulating the subscript identifier. In ‘C’, arr[i] is equivalent to writing *(arr+i), which refers to the i^{th} element of the array. Two dimensional arrays may be represented using row-major form, in which all the elements of the same rows are scanned first or in column-major form in which all the elements of the same column are scanned first beginning from the first column. We can perform various operations on arrays which will be explained in separate posts with relevant ‘C’ program.

## Introduction to Algorithm Design

1. Introduction to Algorithms

- Input, output, definiteness, finiteness, effectiveness.
- Top-down & Bottom-up approaches
- Algorithm complexity
- Space complexity
- Time complexity
- Time space trade off

- Big ‘O’ notation

An *algorithm* defines some process, how an operation works. It is specified in terms of simple steps. Each step must be clearly implementable in a mechanical way, perhaps by a single machine instruction, by a single programming language statement or by a previously defined algorithm. An algorithm to solve a problem must be *correct*; it must produce correct output when given valid input data. When the input data is invalid we usually want the algorithm to warn us and terminate gracefully. Often we want to know how much time an algorithm will take or how much space (computer memory) it will use and this leads to the analysis of algorithms and to the field of computational complexity.

Algorithms and data structures can be specified in any adequately precise language. English and other natural languages are satisfactory if used with care to avoid ambiguity but more precise mathematical languages and programming languages are generally preferred. The execution of the latter can also be automated. A *program* is the expression of some algorithm(s) and data structure(s) in a particular programming language. The program can be used on all types of computer for which suitable language compilers or interpreters exist, making it more valuable.

The basic features of an algorithm includes appropriate input, output, definiteness, finiteness and effectiveness. An algorithm must operate upon a given set of inputs to give desired output. An algorithm must definitely result in an output that is either known or desired from the execution of a program. An algorithm must be finite in itself designed within appropriate boundaries. An algorithm must be effective and efficient enough in terms of space and time considerations to minimize the complexity inherent in its implementation.

**Top down and Bottom up Approaches:**

In the **Top-down** model an overview of the system is formulated, without going into detail for any part of it. Each part of the system is then refined by designing it in more detail. Each new part may then be refined again, defining it in yet more detail until the entire specification is detailed enough to validate the Model. The “Top-down” Model is often designed to get an overview of the proposed system, but insufficient and irrelevant in understanding the elementary mechanisms.

By contrast in **Bottom-up** design individual parts of the system are specified in detail. The part are then linked together to form larger components, which are in turn linked until a complete system is formed. Strategies based on the “bottom-up” information flow seem potentially necessary and sufficient because they are based on the knowledge of all variables that may affect the elements of the system.

**Space Complexity:**

Space complexity theory measures the amount of space required to run an algorithm or a program. The better the time complexity of an algorithm is, the faster the algorithm will carry out its work in practice. Apart from time complexity, its space complexity is also important. This is essentially the number of memory cells required by an algorithm. A good algorithm keeps this number as small as possible.

**Instruction Space: **

Instruction space is the space for simple variables, fixed-size structure variables and constants.

Sometimes, it is essential to pre-determine the time required by an algorithm to execute successfully to give the desired result. Sometimes, it is required to estimate the time, a program may take for its execution, before it is actually executed. A good measure for the running time is the number of executed comparisons.

Time complexity theory measures the amount of time required to execute an algorithm or a program. Time complexity theory measures things in number of “steps”, or computer operations (like addition, subtraction, multiplication, assignment etc.). A count of the number of steps to run an algorithm is the “raw time complexity”. A raw time complexity of “4N^{2} – 5N + 41″ will be O(N^{2}). The complexity for constant values can all be reduced to one, say O(1).

The number of (machine) instructions which a program executes during its running time is called its time complexity. This number depends primarily on the size of the program’s input, that is approximately on the number of the strings to be sorted (and their length) and the algorithm used.

There is often a **time-space-tradeoff** involved in a problem, that is, it cannot be solved with few computing time and low memory consumption. One then has to make a compromise and to exchange computing time for memory consumption or vice versa, depending on which algorithm one chooses and how one parameterizes it.

Time complexities are always specified in the so-called **O-notation**in computer science. One way to describe complexity is by saying that the sorting method has running time **O(n ^{2})**. The expression O is also called

**Landau’s symbol.**

Mathematically speaking, O(n^{2}) stands for a set of functions, exactly for all those functions which, “in the long run”, do now grow faster than the function n^{2}, that is for those functions for which the function n^{2} is an upper bound (apart from a constant factor). To be precise, the following holds true: A function f is an element of the set O(n^{2}) if there are a factor c and an integer number n_{0 }such that for all n equal to or greater than this n_{0 }the following holds:

f(n) <= c . n^{2}

A function f from O(n^{2}) may grow considerably more slowly than n^{2} so that, mathematically speaking, the quotient f/n^{2} converges to 0 with growing n. An example of this is the function f(n) = n.

**Ways to calculate frequency/efficiency of an algorithm:**

**Eg. (i):**

for i = 1 to n

for j = 1 to n

i j No. of times

1 1 to n n

2 2 to n n-1

3 3 to n n-2

– – –

– – –

– – –

n-1 n-1 to n 2

n n to n 1

Therefore,

1+2+3+……………+(n-1)+n

=n(n+1)/2 (Sum of n terms)

Frequency, f(n) = n(n+1)/2 = n^{2}/2 + n/2 = 0.5 n^{2} + 0.5 n

A higher value of n^{2} will be most effective. 0.5 is a negligible value.

Therefore in Big(O) notation, f(n) = O(n^{2}).

**Eg. (ii):**

for i = 1 to n

sum = sum + I;

Big (O) Notation, f(n) = O(n)

Linear Search:

1. **Best Case:** When search value is found at first position

f(n) = 1

2. **Worst Case:** Value not in array or in the end

f(n) = n (n number of comparisons are made).

3. **Average Case:** 1, 2, 3, ………………………..n

(1 + 2 + 3 + ……………………… + n) / n

=n(n+1)/2.n = (n+1)/2

=n/2 + 1/2

=0.5 n + 0.5

f(n) = O(n)

Thus Big-O Notation is a useful measurement tool for measuring time complexity.

## Basic concepts of Data Representation

1. Abstract and System defined data types

- Data type
- Primitive data type
- Abstract data type
- Polymorphic data type

2. Representation, Primitive data structures

- Data structures
- Linear data structures
- Arrays
- Linear
- Two dimensional

- Linked List
- Linear
- Doubly
- Circular
- Stacks
- Queues

- Arrays
- Non-linear data structures
- Trees
- Binary trees
- Binary search trees
- Heaps

- Graphs
- Hash tables

- Trees

- Linear data structures

- Common operations on Data structures
- Traversal
- Searching
- Insertion
- Deletion
- Sorting
- Merging

**Data type:** A type stands for a set of values. A set of values that correspond to a specific type, say a set of integers belong to the type integer, are collectively referred by using a data type. A data type specifies the type of data required to be stored in a variable.

In ‘C’, variables must be declared before their use and must be declared with a relevant data type, which specifies the range of values that may be stored in the variable of that data type. In ‘C’, data type specifies the amount of memory to be allocated. For example: char allocates 1 byte, int – 2bytes, float – 4bytes, double – 8 bytes and long double – 10bytes on a specific machine. Data types may be Primitive (fundamental), Abstract or Polymorphic data types.

The following are the characteristics of data types:

- A domain is defined – the set of all possible values.
- A set of allowable operations on those values is also defined.
- A value is atomic within the domain.

**Primitive Data Type:** A primitive data type say char, int, float, double is the data type that is used to refer to a single value such as integer, float, character etc. ‘C’ provides above four basic data types. A variable in ‘C’ may be declared to be of primitive data type, such as int x; declares a variable x of type integer and allocates 2 bytes of memory for storing an integer value.

**Abstract Data Type:** ADT may be defined as a set of data values and associated operations that are precisely specified independent of any particular implementation. Thus an Abstract Data Type is an organized collection of information and a set of operations used to manage that information. The set of operations defines the interface of the ADT. As long as the ADT fulfills the conditions of the interface, it doesn’t really matter how the ADT is implemented. Since, in ADT, the data values and operations are defined with mathematical precision, rather than as an implementation in a computer language, we may reason about effects of the operations, relations to other abstract data types whether a program implements the data type etc. One of the simplest abstract data type is the stack data type for which functions might be provided to create an empty stack, to push values onto a stack and to pop values from a stack.

The basic difference between abstract data type (ADT) and concrete data type is that the latter allow us to look at the concrete representation, whereas the former hide the representation from us. An ADT may be pure ADT or Updatable ADT. A pure ADT is one where all operations are pure functions. This means that operations have no side effects. In particular, they do not modify or update there input arguments. They just use these arguments to generate output, which are fresh values of ADT (or of other types). Most concrete types are pure. For example, no operation on integers actually modifies an integer. Instead, all the operations like ‘+’ produce fresh outputs.

An updatable ADT is one where some operations actually change values of the ADT. For example, suppose we had an operation called ‘pop’ that took a stack as an argument and modified it. (“in place”, “destructively”) by removing the highest priority item. This operation would be considered impure and the whole ADT would then be impure also. An ADT may be user defined ADT.

We know that an Abstract Data Type is a data type that satisfies the following two conditions:

1. The representation, or definition, of the type and the operations are contained in a single syntactic unit.

2. The representation of objects of the type is hidden from the program units that use the type, so only direct operations possible on those objects are those provided in the type’s definition.

A user defined Abstract Data Type should provide:

1. A type definition that allows program units to declare variables of the type, but hides the representation of these variables.

2. A set of operations for manipulating objects of the type.

An example of a user defined abstract data type is structure. ‘C’ provides four basic types: int, char, float and double. However, ‘C’ also provides the programmer with the ability to define his/her own types. Structure is one such example. A structure is an aggregate of different parts, where each part is of some existing type.

struct abc

{int x;

float y;

};

The above structure definition does not create any variables, rather it creates a new type. Variables of this type may be created in a similar way to variables of a built in type.

struct abc a;

The typedef keyword allows us to create new type names for our new types.

For example:

typedef struct abc AB;

where AB is a new type name that can now be used to create new types.

AB b;

**Polymorphic Data Types:** A polymorphic data type is a data type, which can be transformed to any distinct data type as required. An item of polymorphic data type is created by the use of a generic pointer. A generic pointer is simply a byte address without an associated type. In other words, a generic pointer does not point to an object of a specific type; it just points to some location in the memory of the computer. In ANSI C, a generic pointer is declared as a void pointer. It is only when the actual operations must be performed on the data that generic pointers are cast to pointers of specific types and de-referenced.

**Data Structures:** The following are the characteristic features of data structures:

1. It contains component data items, which may be atomic or another data structure (still a domain).

2. A set of operations on one or more of the component items.

3. Defines rules as to how components relates to each other and to the structure as a whole (assertions).

A data structure may be static or dynamic. A static data structure has a fixed size. This meaning is different from the meaning of static modifier. Arrays are static; once we define the number of elements it can hold, the number doesn’t change. A dynamic data structure grows and shrinks at execution time as required by its contents. A dynamic data structure is implemented using links.

Data structures may further be categorized into linear data structures and non-linear data structures. In linear data structures every component has a unique predecessor and successor, except first and last elements, whereas in case of non-linear data structures, no such restriction is there as elements may be arranged in any desired fashion restricted by the way we use to represent such types.

**Sequential Access vs. Direct Access:** In sequential access, to access the n-th element, you must access the preceding (n-1) elements. In direct (random) access, any element can be accessed without accessing its predecessor or successor.

Linear Data structures include arrays, structures, linked lists, stacks and queues. Non-linear data structures include trees, binary trees, graphs and digraphs.

1. **Arrays:** An array is said to be

- a collection of a fixed number of component values, each of which are of the same type, atomic and structured.
- a set of index (subscript) values that are non-negative integers (consecutive).
- there is a 1-1 relationship between such index and an array component (element). Index 0 selects the first element, 1 the second and so on.

**Difference in Array and Structure:**

- An array is a homogenous structure, whereas structure are heterogenous structures.
- A component of an array is accessed by its positions in the array, where a component of a structure is accessed by an identifier (the name).
- Because array components are accessed by position, an array is a structured composite type.

An array may be one dimensional or two dimensional. A one-dimensional array contains elements row-wise, from representation point of view. It contains only one subscript. The subscript value begins with zero. A two dimensional array contains two subscripts, one for row and the other for column. Thus a two dimensional array may be considered as a tabular representation of elements where elements are arranged in rows and columns. Two subscripts are used combinedly one for row position and other for column position, to refer to an element in a 2D array.

2. **Linked List:** A linked list is a linear list of elements linked/joined together with links, where link is maintained by keeping a reference/address of the next element. The address of first element is sustained for traversal or other operations to be performed on the list. A linked list is a dynamic structure. It may grow or shrink dynamically. A linked list may be singly linked list, doubly linked list, circular linked list or doubly circular linked list. Header lists may also be created that contains a header node.

A singly linked list is unidirectional since elements are added up or removed from left to right. A doubly linked list is an improved data structure form of singly linked list, since it contains both forward and backward links/reference. Thus, traversal may easily be done in either of the direction. A circular linked list is one in which the last node points to the first node, thereby making a loop or circle. Circular lists are useful in situations where processing is required to be done repetitively in a loop. A doubly circular linked list is that which maintains forward as well as backward reference and also contains loop by keeping the address of first node in last and last node in first.

All these list categories have their own merits and demerits depending upon the situation in which they are implemented.

3. **Stacks:** A stack ADT is also linear like a list or a queue. In a stack, items are added and removed from only one end of the stack, called the top of the stack. It is therefore a LIFO (Last In First Out) data structure. Examples of stacks are – a stack of plates in a cupboard, a stack of files in the drawer of a table. Stacks are often drawn vertically. Some operations that may be performed on stack are:

**Push:**adds an item to the top of the stack**Pop:**removes an item from the top of the stack**Peek(top):**retrieves the top item without removing it**Empty:**returns true if the stack is empty, false otherwise

**Queues:**A queue is similar to a list but adds items only to the rear of the list and removes them only from the front. It is called a FIFO data structure -First In First Out. An example of a queue is a line of people at a ticket window. We can define the following operations for a queue:

**enqueue:**adds an item to the rear of the queue**dequeue (or serve):**removes an item from the front of the queue**empty:**returns true if the queue is empty, false otherwise

Queues often are helpful in simulations or any situation in which items get backed up while awaiting processing. A queue can be represented by an array or using a linked list. A linked list representation is more efficient if the references point from the front toward the rear of the queue.

**Non-Linear Data Structures:**

1. **Trees:** A tree is a non-linear data structure that consists of a root node and potentially many levels of additional nodes that form a hierarchy. Nodes that have no children are called leaf nodes.

(i) **Binary Trees:** A binary tree is defined recursively. Either it is empty (the base case) or it consists of a root and two subtrees, each of which is a binary tree. Binary tree and trees are represented using references as dynamic links, though it is possible to used fixed representation like arrays.

(ii) **Binary Search Trees:** A Binary Search Tree is a binary tree that satisfies the following conditions:

- All values less than the key value (root) form a part of left subtree.
- All values greater than the key value (root) form a part of right subtree.
- The left and right subtrees are also binary trees.

(iii) **Heap:** Heap is a complete binary tree with n nodes such that:

- the root node contains the largest element
- both the left and the right sub-trees of the root are heaps once again

2. **Graphs:** A graph is a non-linear data structure. Unlike a tree or binary tree, a graph does not have a root. Any node in a graph can be connected to any other node by an edge. In a directed graph or digraph, each edge has a specific direction. Edges with direction sometimes are called arcs.

Both graphs and digraphs can be represented using dynamic links or using arrays. As always, the representation should facilitate the intended operations and make them convenient to implement.

3. ** Hash Tables:** A hash table is used to represent the hash values that are generated using a hash function. Generally, the size of the hash table is determined in terms of the data elements it is going to sustain in itself.

**Common operations on Data Structures:**

**Traversal****Searching****Insertion****Deletion****Sorting****Merging**

**Traversal:**It refers to reading the elements represented by a given data structure. The traversal in case of a singly linked list is from the first node until NULL is found, since it is a unidirectional list. Reversal traversal is achieved easily in a doubly linked list, since backward references are maintained. In a circular linked list, traversal can begin from any node that is a part of the list. In case of tree, traversal can be either in inorder, preorder or postorder form. Similarly, in graph each vertex node may be visited when the graph is traversed. Thus, simply stating, visiting each element in a given data structure is termed as traversal operation.

**Searching:**Searching is the process of locating a user defined value from a given list of values. Search operation may be performed on elements that are arranged/sorted as well as on unsorted elements. The search operation is considered efficient if it is performed on sorted elements.

**Insertion:**Insertion refers to inserting a new element in a given data structure. Insertion in an array requires subscript identification where this new element is to be stored. In case of linked list, insertion may be performed at beginning, end, before/after/at element, before/after/at position. Similar cases are applicable for other categories of linked list. In case of trees, insertion may be either as root, left child or right child depending upon the conditions given. In graphs, insertion of a node requires setting edges appropriately such that the new element inserted becomes a part of the graph.

**Deletion:**Deletion is the process of removing an element from a given data structure. The basic procedure followed for deletion is: locate the element in the given data structure you wish to delete. If the element is found, then assign its address in some other pointer of the same type. Make appropriate pointer adjustments to remove all links that connect this node in a given data structure. Then free the memory allocated to this element. Freeing the memory explicitly helps in better memory management, since memory issues are resolved and used memory when freed is again made available for use by deletion process.

**Sorting:**Sorting refers to a proper arrangement of data elements in the given data structure. Arrangement of elements may be done either in ascending order or descending order. Sorting sometimes may prove inefficient, when it involves swapping many items simultaneously for a wide range to input. Complexity of a search algorithm and its efficiency may be computed and checked as to which sorting algorithm out of the available sorting techniques such as bubble sort, merge sort, quick sort etc. is efficient under what set of given conditions. Detailed sorting techniques will be described later as a separate item.

**Merging:**Merging is the process of combining two given data structures say linked list or arrays such that an appropriate combination is formed by the merging process. If merging is performed such that the resultant data structure is in sorted form, then two linked lists are merged such that the linked so obtained after merging is the sorted linked list. However merging may be done in many other ways as appropriate under given set of conditions.