중간고사 정리

컴퓨터공학

중간고사 정리

애기공룡훈련병 2019. 10. 15. 20:35

HDFS(Hadoop Distributed File System) : Cluster에 데이터를 저장

Map Reduce : Cluster의 데이터를 처리

RDD(Resilent Distributed Dataset)

- Core data structure in Spark

- Distributed, resilient, immutable(수정이 안됨)

- lazy evaluated : evaluation command가 나올 때 evaluation된다.

- Abstract Data Set

- Distribution은 System이 수행함.

- Fault가 발생하면 System이 복구함.

Big Issues in Distributed System

Fault Tolerant : Distributed PC에 고장이 난 경우에 이를 자동 복구할 수 있는 방안

Hadoop : multiple copies

Spark : Lineage

map(), filter(), reduce() : list를 list로 변환해주는 함수

map()

list [x, y, z] -> [f(x), f(y), f(z)] modified list

filter()

list [x, y, z] -> [x, y] if condition is true

groupBy() : Key로 묶는다.

lambda function : small anonymous function

사용 목적 : Memory에 남기고 싶지 않을 때

** map() Example

items = [ 1, 2, 3, 4, 5 ]
squared = list(map(lambda x: x**2, items))
print(squared)

** map() Example - tuple / set

names = ['krunal', 'ankit', 'rushabh', 'dhaval', 'nehal']
convertedTuple = tuple(map(lambda s: str(s).upper(), names))
print(covertedTuple)

strings = ['krunal', 'ankit', 'rushabh', 'dhaval', 'nehal']
convertedSet = set(map(lambda s: str(s).upper(), strings))
print(convertedSet)

filter() function

Filter extracts each element in the sequence for which the function returns True

Syntax : filter(function, iterable)

* range() : Memory를 잡지 않음

reduceByKey(), groupByKey() Example

wordsCountWithReduce = wordPairsRDD.reduceByKey(lambda x, y : x + y).collect()
print(wordsCountWithReduce)

wordsCountsWithGroup = wordPairsRDD.groupByKey().map(lambda x :(x[0], sum(x[1]))).collect()
print(wordsCountsWithGroup)

위 두 코드는 같은 결과를 출력해주는 두 가지 방식을 나타낸 것이다.

Lazy Evaluation

Same holds for other transformations - they are lazy

they compute result only when accessed.

Cogroup()

Given two keyed RDDs, groups all values with the same key

returns triple (Key, X-values, Y-values) for every key where X-values are all values found under the key k in X, and Y-values are similar.

Join()

Given two keyed RDDs, returns all matching items in two datasets

triple (k, x, y), (k, x) in X, (k, y) in Y

leftOuterJoin, rightOuterJoin, fullOuterJoin

Drivers and Executors

Driver delegates tasks to executors to use cluster resources.

In local mode, executors are collocated with the driver

In cluster mode, executors are located on other machines

'컴퓨터공학' 카테고리의 다른 글

클라우드 컴퓨팅 기술 정리 (1)	2019.12.03
클라우드 시스템 개요 (0)	2019.12.02
문제 풀이를 위한 프로그래밍 기법 - 3. 비트 연산 (0)	2019.08.05
문제 풀이를 위한 프로그래밍 기법 - 2. 재귀적 알고리즘 (0)	2019.07.31
문제 풀이를 위한 프로그래밍 기법 - 1. 언어적 특성 (0)	2019.07.28

현재글중간고사 정리

아기의 세상살이

다양한 삶의 지식, 기술 지식 등을 공유합니다.

스위프트, XCode, java, SwiftUI, Swift, 아이폰, Keychain, 코틀린, 알고리즘, 함수형, IOS개발, UIkit, Rxjava, apple, Kotlin, 개발, android, 안드로이드, ios, 보안,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

아기의 세상살이