昨天看到了AbstractFileSystem,也知道应用访问文件是通过FileContext这个类,今天来看这个类的源代码,先看下这个类老长的注释说明
1 /** 2 * The FileContext class provides an interface to the application writer for 3 * using the Hadoop file system. 4 * It provides a set of methods for the usual operation: create, open, 5 * list, etc 6 * 7 *8 * *** Path Names *** 9 *
10 * 11 * The Hadoop file system supports a URI name space and URI names. 12 * It offers a forest of file systems that can be referenced using fully 13 * qualified URIs. 14 * Two common Hadoop file systems implementations are 15 *
- 16 *
- the local file system: file:///path 17 *
- the hdfs file system hdfs://nnAddress:nnPort/path 18 *
26 * 27 * To facilitate this, Hadoop supports a notion of a default file system. 28 * The user can set his default file system, although this is 29 * typically set up for you in your environment via your default config. 30 * A default file system implies a default scheme and authority; slash-relative 31 * names (such as /for/bar) are resolved relative to that default FS. 32 * Similarly a user can also have working-directory-relative names (i.e. names 33 * not starting with a slash). While the working directory is generally in the 34 * same default FS, the wd can be in a different FS. 35 *
36 * Hence Hadoop path names can be one of: 37 *
- 38 *
- fully qualified URI: scheme://authority/path 39 *
- slash relative names: /path relative to the default file system 40 *
- wd-relative names: path relative to the working dir 41 *
45 * ****The Role of the FileContext and configuration defaults**** 46 *
47 * The FileContext provides file namespace context for resolving file names; 48 * it also contains the umask for permissions, In that sense it is like the 49 * per-process file-related state in Unix system. 50 * These two properties 51 *
- 52 *
- default file system i.e your slash) 53 *
- umask 54 *
65 * The file system related SS defaults are 66 *
- 67 *
- the home directory (default is "/user/userName") 68 *
- the initial wd (only for local fs) 69 *
- replication factor 70 *
- block size 71 *
- buffer size 72 *
- encryptDataTransfer 73 *
- checksum option. (checksumType and bytesPerChecksum) 74 *
77 * *** Usage Model for the FileContext class *** 78 *
79 * Example 1: use the default config read from the $HADOOP_CONFIG/core.xml. 80 * Unspecified values come from core-defaults.xml in the release jar. 81 *
- 82 *
- myFContext = FileContext.getFileContext(); // uses the default config 83 * // which has your default FS 84 *
- myFContext.create(path, ...); 85 *
- myFContext.setWorkingDir(path) 86 *
- myFContext.open (path, ...); 87 *
- 90 *
- myFContext = FileContext.getFileContext(URI) 91 *
- myFContext.create(path, ...); 92 * ... 93 *
- 96 *
- myFContext = FileContext.getLocalFSFileContext() 97 *
- myFContext.create(path, ...); 98 *
- ... 99 *
- 103 *
- configX = someConfigSomeOnePassedToYou.104 *
- myFContext = getFileContext(configX); // configX is not changed,105 * // is passed down 106 *
- myFContext.create(path, ...);107 *
- ...108 *
FileContext类为应用程序写提供一个接口,提供了常用操作:创建(create),打开(open),列举(list)等
Hadoop 文件系统的两个通用实现分别是
- 本地文件系统 file:///path
- hdfs文件系统 hdfs://nnAddress:nnPort/path
URI命名非常灵活,它需要知道服务端的名字或者地址。HDFS有一个默认值,这有一个额外的好处就是,允许更改默认的fs(比如:管理员将应用从集群1移到集群2)
Hadoop 支持默认文件系统的理念。用户可以设置他的默认文件系统。
默认的文件系统实现了一个默认的scheme和authority;slash-relative名称(例如:/for/bar) 将解析成相对于默认FS的路径
同理,用户可以拥有自己的working-directory-relative名称(不是以slash开头的)。
因此,Hadoop路径的可以是以下几种:
完全合法的URI scheme://authority/path
slash relative names /path 相对于默认的文件系统
wd-relative names path 相对于工作目录
1 private FileContext(final AbstractFileSystem defFs, 2 final FsPermission theUmask, final Configuration aConf) { 3 defaultFS = defFs; 4 umask = FsPermission.getUMask(aConf); 5 conf = aConf; 6 try { 7 ugi = UserGroupInformation.getCurrentUser(); 8 } catch (IOException e) { 9 LOG.error("Exception in getCurrentUser: ",e);10 throw new RuntimeException("Failed to get the current user " +11 "while creating a FileContext", e);12 }13 /*14 * Init the wd.15 * WorkingDir is implemented at the FileContext layer 16 * NOT at the AbstractFileSystem layer. 17 * If the DefaultFS, such as localFilesystem has a notion of18 * builtin WD, we use that as the initial WD.19 * Otherwise the WD is initialized to the home directory.20 */21 workingDir = defaultFS.getInitialWorkingDirectory();22 if (workingDir == null) {23 workingDir = defaultFS.getHomeDirectory();24 }25 resolveSymlinks = conf.getBoolean(26 CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY,27 CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_DEFAULT);28 util = new Util(); // for the inner class29 }
FileContext传进来三个参数,
- defFs FileContext默认的FS
- theUmask 貌似没有使用到,历史遗留问题吗?他的umask使用FsPermission.getUMask(conf)初始化了
- conf 配置信息
下面来看它说的几个常用的方法,首先是create,隐藏的是一堆的注释
1 /** 2 * Create or overwrite file on indicated path and returns an output stream for 3 * writing into the file. 4 * 5 * @param f the file name to open 6 * @param createFlag gives the semantics of create; see { @link CreateFlag} 7 * @param opts file creation options; see { @link Options.CreateOpts}. 8 *
- 9 *
- Progress - to report progress on the operation - default null10 *
- Permission - umask is applied against permisssion: default is11 * FsPermissions:getDefault()12 * 13 *
- CreateParent - create missing parent path; default is to not14 * to create parents15 *
- The defaults for the following are SS defaults of the file16 * server implementing the target path. Not all parameters make sense17 * for all kinds of file system - eg. localFS ignores Blocksize,18 * replication, checksum19 *
- 20 *
- BufferSize - buffersize used in FSDataOutputStream21 *
- Blocksize - block size for file blocks22 *
- ReplicationFactor - replication for blocks23 *
- ChecksumParam - Checksum parameters. server default is used24 * if not specified.25 *
f
already exists32 * @throws FileNotFoundException If parent of f
does not exist33 * and createParent
is false34 * @throws ParentNotDirectoryException If parent of f
is not a35 * directory.36 * @throws UnsupportedFileSystemException If file system for f
is37 * not supported38 * @throws IOException If an I/O error occurred39 * 40 * Exceptions applicable to file systems accessed over RPC:41 * @throws RpcClientException If an exception occurred in the RPC client42 * @throws RpcServerException If an exception occurred in the RPC server43 * @throws UnexpectedServerException If server implementation throws44 * undeclared exception to RPC server45 * 46 * RuntimeExceptions:47 * @throws InvalidPathException If path f
is not valid48 */
1 public FSDataOutputStream create(final Path f, 2 final EnumSetcreateFlag, Options.CreateOpts... opts) 3 throws AccessControlException, FileAlreadyExistsException, 4 FileNotFoundException, ParentNotDirectoryException, 5 UnsupportedFileSystemException, IOException { 6 Path absF = fixRelativePart(f); 7 8 // If one of the options is a permission, extract it & apply umask 9 // If not, add a default Perms and apply umask;10 // AbstractFileSystem#create11 12 CreateOpts.Perms permOpt = CreateOpts.getOpt(CreateOpts.Perms.class, opts);13 FsPermission permission = (permOpt != null) ? permOpt.getValue() :14 FILE_DEFAULT_PERM;15 permission = permission.applyUMask(umask);16 17 final CreateOpts[] updatedOpts = 18 CreateOpts.setOpt(CreateOpts.perms(permission), opts);19 return new FSLinkResolver () {20 @Override21 public FSDataOutputStream next(final AbstractFileSystem fs, final Path p) 22 throws IOException {23 return fs.create(p, createFlag, updatedOpts);24 }25 }.resolve(this, absF);26 }
create方法是用来在指定的路径上创建或者重写文件并返回outputstream的一个方法
在最后return时 new的 FSLinkResolver是用来处理路径为符号链接的情况
1 /** 2 * Generic helper function overridden on instantiation to perform a 3 * specific operation on the given file system using the given path 4 * which may result in an UnresolvedLinkException. 5 * @param fs AbstractFileSystem to perform the operation on. 6 * @param p Path given the file system. 7 * @return Generic type determined by the specific implementation. 8 * @throws UnresolvedLinkException If symbolic link path
could 9 * not be resolved10 * @throws IOException an I/O error occurred11 */12 abstract public T next(final AbstractFileSystem fs, final Path p)13 throws IOException, UnresolvedLinkException;14 15 16 17 18 /**19 * Performs the operation specified by the next function, calling it20 * repeatedly until all symlinks in the given path are resolved.21 * @param fc FileContext used to access file systems.22 * @param path The path to resolve symlinks on.23 * @return Generic type determined by the implementation of next.24 * @throws IOException25 */26 public T resolve(final FileContext fc, final Path path) throws IOException {27 int count = 0;28 T in = null;29 Path p = path;30 // NB: More than one AbstractFileSystem can match a scheme, eg 31 // "file" resolves to LocalFs but could have come by RawLocalFs.32 AbstractFileSystem fs = fc.getFSofPath(p);33 34 // Loop until all symlinks are resolved or the limit is reached35 for (boolean isLink = true; isLink;) {36 try {37 in = next(fs, p);38 isLink = false;39 } catch (UnresolvedLinkException e) {40 if (!fc.resolveSymlinks) {41 throw new IOException("Path " + path + " contains a symlink"42 + " and symlink resolution is disabled ("43 + CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY + ").", e);44 }45 if (!FileSystem.areSymlinksEnabled()) {46 throw new IOException("Symlink resolution is disabled in"47 + " this version of Hadoop.");48 }49 if (count++ > FsConstants.MAX_PATH_LINKS) {50 throw new IOException("Possible cyclic loop while " +51 "following symbolic link " + path);52 }53 // Resolve the first unresolved path component54 p = qualifySymlinkTarget(fs.getUri(), p, fs.getLinkTarget(p));55 fs = fc.getFSofPath(p);56 }57 }58 return in;59 }
next 是一个一般的helper函数,需要被实例重写,从而在给定路径的文件系统上执行特定的操作,可能会抛UnresolvedLinkException异常
resolve 通过next执行特定的操作,反复的调用next函数,知道路径上所有的符号链接被解析