开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> C++知识库 -> 【软件工程实践】Pig项目12-Data目录源码分析-其他元组2 -> 正文阅读

[C++知识库]【软件工程实践】Pig项目12-Data目录源码分析-其他元组2

2021SC@SDUSC?

?上篇我们讲解了AmendableTuple，本篇继续讲解其他元组

其他元组

AppendableSahemaTuple

ApendableSachemaTuples是一个抽象类，其UML如下

?继承关系

public abstract class AppendableSchemaTuple<T extends AppendableSchemaTuple<T>> extends SchemaTuple<T>

?继承自SchemaTuple这个也没有接触过，并且该类也没有任何注释，也没有测试函数，由于是抽象类甚至没有构造函数，于是只能大致看一下public的方法

@Override
public void append(Object val) {
??? if (appendedFields == null) {
??????? appendedFields = mTupleFactory.newTuple();
??? }

??? appendedFields.append(val);
}

public SchemaTuple<T> set(List<Object> l) throws ExecException {
??? int listSize = l.size();
??? int schemaSize = schemaSize();

??? if (listSize < schemaSize) {
??????? throw new ExecException("Given list of objects has too few fields ("+l.size()+" vs "+schemaSize()+")");
??? }

??? Iterator<Object> it = l.iterator();

??? generatedCodeSetIterator(it);

??? resetAppendedFields();

??? while (it.hasNext()) {
??????? append(it.next());
??? }

??? return this;
}

public void set(int fieldNum, Object val) throws ExecException {
??? int diff = fieldNum - schemaSize();
??? if (diff >= 0 && diff < appendedFieldsSize()) {
??????? setAppendedField(diff, val);
??? } else {
??????? super.set(fieldNum, val);
??? }
}

set函数将 ObjectList里的数据加入到了Tuple中

接下来看看它的父类

SchemaTuple

继承关系

public abstract class SchemaTuple<T extends SchemaTuple<T>> extends AbstractTuple implements TypeAwareTuple {?

UML

大小很大，只能给出部分了

一些注释

/**
?* SchemaTuple 是一种类型感知元组，速度更快，内存效率更高。在我们的实现中，给定一个 Schema，
?* 代码生成用于扩展这个类。此类提供了广泛的功能，可最大限度地降低必须生成的代码的复杂性。
?* 奇怪的通用签名允许进行某些优化，例如“setSpecific(T t)”，它允许我们在类型匹配时进行更快的设置和比较
?* （因为代码是生成的，没有其他方法可以知道）。
?*/

?这个类很有意思，是一个快速代码生成器，推测是Pig Latin实现的组成部分，它的成员方法有很大部分是用于实现代码生成的，说实话已经超出了数据结构分析的范围，而是在探讨系统实现了，并且有部分是抽象方法，例如刚才ApendableSachemaTuple实现的方法

为了进一步理解，我们直接看它的父类TypeAwareTuple，是一个接口，代码如下

public interface TypeAwareTuple extends Tuple {
??? public void setInt(int idx, int val) throws ExecException;
??? public void setFloat(int idx, float val) throws ExecException;
??? public void setDouble(int idx, double val) throws ExecException;
??? public void setLong(int idx, long val) throws ExecException;
??? public void setString(int idx, String val) throws ExecException;
??? public void setBoolean(int idx, boolean val) throws ExecException;
??? public void setBigInteger(int idx, BigInteger val) throws ExecException;
??? public void setBigDecimal(int idx, BigDecimal val) throws ExecException;
??? public void setBytes(int idx, byte[] val) throws ExecException;
??? public void setTuple(int idx, Tuple val) throws ExecException;
??? public void setDataBag(int idx, DataBag val) throws ExecException;
??? public void setMap(int idx, Map<String,Object> val) throws ExecException;
??? public void setDateTime(int idx, DateTime val) throws ExecException;

??? public int getInt(int idx) throws ExecException, FieldIsNullException;
??? public float getFloat(int idx) throws ExecException, FieldIsNullException;
??? public double getDouble(int idx) throws ExecException, FieldIsNullException;
??? public long getLong(int idx) throws ExecException, FieldIsNullException;
??? public String getString(int idx) throws ExecException, FieldIsNullException;
??? public boolean getBoolean(int idx) throws ExecException, FieldIsNullException;
??? public BigInteger getBigInteger(int idx) throws ExecException;
??? public BigDecimal getBigDecimal(int idx) throws ExecException;
??? public byte[] getBytes(int idx) throws ExecException, FieldIsNullException;
??? public Tuple getTuple(int idx) throws ExecException;
??? public DataBag getDataBag(int idx) throws ExecException, FieldIsNullException;
??? public Map<String,Object> getMap(int idx) throws ExecException, FieldIsNullException;
??? public DateTime getDateTime(int idx) throws ExecException, FieldIsNullException;

??? public Schema getSchema();
}

内容就是set、get了一堆数据类型

BinSedesTuple

一些注释

/**
?* 这个元组有一个更快的（反）序列化机制。它用于存储 Map 和 Reduce 之间以及 MR 作业之间的中间数据。
?* 这仅供内部猪使用。序列化格式可以更改，因此不要使用它来存储任何持久数据（即在加载存储函数中）。
?*/

继承关系（终于不是抽象类了）

public class BinSedesTuple extends DefaultTuple?

构造函数很简单粗暴，全部super()

BinSedesTuple() {
?? super();
}

/**
?* 构造一个具有已知字段数的元组。包级别，以便调用者不能直接调用它。
?* @param size Number of fields to allocate in the tuple.
?*/
BinSedesTuple(int size) {
??? super(size);
}

/**
?* 从现有的对象列表构造一个元组。包级别，以便调用者不能直接调用它。
?* @param c List of objects to turn into a tuple.
?*/
BinSedesTuple(List<Object> c) {
??? super(c);
}

唯一和父类有区别的地方就是下面这个结构了

private static final InterSedes sedes = InterSedesFactory.getInterSedesInstance();

public static Class<? extends TupleRawComparator> getComparatorClass() {
??? return InterSedesFactory.getInterSedesInstance().getTupleRawComparatorClass();
}

?可以理解为就是普通tuple+interSede

最后给出UML

NonWritableTuple

一些注释?

/**
?* A singleton Tuple type which is not picked up for writing by PigRecordWriter
?* 未被PigRecordWriter选取用于写入的单例元组类型
?*/

继承关系???

?public class NonWritableTuple extends AbstractTuple {
?

值得一提的是DefaultTuple也继承于AbstractTuple?

UML图

构造函数直接是空的！（真未写入的Tuple）显然这是个处理过程中间用的Tuple，用完就丢的那种

TargetedTuple?

一些注释

/**
?* A tuple composed with the operators to which
?* it needs be attached
?*由需要附加的运算符组成的元组
?*/

继承关系

public class TargetedTuple extends AbstractTuple

UML

?从描述来看，此Tuple有一个存储运算符的数据结构，即tagetOps，推测是类似计算机计算算术的时候采用的方法，把数字和运算符放进一个栈结构中，然后一个个取出来处理

TimestampedTuple?

继承关系

?public class TimestampedTuple extends DefaultTuple {

UML图?

有两个重要的成员timestamp（时间戳）、heartbeat?，由于没有注释不知道具体是做什么用的

?protected double timestamp = 0;????? // 这个元组的时间戳
??? protected boolean heartbeat = false;? // 如果这是一个心跳，则为真（即目的只是传达新的时间戳；不携带数据）?

构造函数

?public TimestampedTuple(int numFields) {
??????? super(numFields);
??? }
???
??? public TimestampedTuple(String textLine, String delimiter, int timestampColumn,
??????????????????????????? SimpleDateFormat dateFormat){
??????? if (delimiter == null) {
??????????? delimiter = defaultDelimiter;
??????? }
??????? String[] splitString = textLine.split(delimiter, -1);
??????? mFields = new ArrayList<Object>(splitString.length-1);
??????? for (int i = 0; i < splitString.length; i++) {
??????????? if (i==timestampColumn){
??????????????? try{
??????????????????? timestamp = dateFormat.parse(splitString[i]).getTime()/1000.0;
??????????????? }catch(ParseException e){
??????????????????? log.error("Could not parse timestamp " + splitString[i]);
??????????????? }
??????????? }else{
??????????????? mFields.add(splitString[i]);
??????????? }
??????? }
??? }
?