|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
org.apache.hadoop.filecache
Class DistributedCache
java.lang.Object org.apache.hadoop.filecache.DistributedCache
public class DistributedCache
- extends Object
Distribute application-specific large, read-only files efficiently.
DistributedCache
is a facility provided by the Map-Reduce
framework to cache files (text, archives, jars etc.) needed by applications.
Applications specify the files, via urls (hdfs:// or http://) to be cached
via the JobConf
.
The DistributedCache
assumes that the
files specified via hdfs:// urls are already present on the
FileSystem
at the path specified by the url.
The framework will copy the necessary files on to the slave node before any tasks for the job are executed on that node. Its efficiency stems from the fact that the files are only copied once per job and the ability to cache archives which are un-archived on the slaves.
DistributedCache
can be used to distribute simple, read-only
data/text files and/or more complex types such as archives, jars etc.
Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave nodes.
Jars may be optionally added to the classpath of the tasks, a rudimentary
software distribution mechanism. Files have execution permissions.
Optionally users can also direct it to symlink the distributed cache file(s)
into the working directory of the task.
DistributedCache
tracks modification timestamps of the cache
files. Clearly the cache files should not be modified by the application
or externally while the job is executing.
Here is an illustrative example on how to use the
DistributedCache
:
It is also very common to use the DistributedCache by using// Setting up the cache for the application 1. Copy the requisite files to theFileSystem
: $ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat $ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip $ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar $ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar $ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz $ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz 2. Setup the application'sJobConf
: JobConf job = new JobConf(); DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"), job); DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job); DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job); DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job); DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job); DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job); 3. Use the cached files in theMapper
orReducer
: public static class MapClass extends MapReduceBase implements Mapper<K, V, K, V> { private Path[] localArchives; private Path[] localFiles; public void configure(JobConf job) { // Get the cached archives/files localArchives = DistributedCache.getLocalCacheArchives(job); localFiles = DistributedCache.getLocalCacheFiles(job); } public void map(K key, V value, OutputCollector<K, V> output, Reporter reporter) throws IOException { // Use data from the cached archives/files here // ... // ... output.collect(k, v); } }
GenericOptionsParser
.
This class includes methods that should be used by users
(specifically those mentioned in the example above, as well
as addArchiveToClassPath(Path, Configuration)
),
as well as methods intended for use by the MapReduce framework
(e.g., JobClient
). For implementation
details, see TrackerDistributedCacheManager
and
TaskDistributedCacheManager
.
Field Summary | |
---|---|
static String |
CACHE_ARCHIVES
Warning: CACHE_ARCHIVES is not a *public* constant. |
static String |
CACHE_ARCHIVES_SIZES
Warning: CACHE_ARCHIVES_SIZES is not a *public* constant. |
static String |
CACHE_ARCHIVES_TIMESTAMPS
Warning: CACHE_ARCHIVES_TIMESTAMPS is not a *public* constant. |
static String |
CACHE_FILES
Warning: CACHE_FILES is not a *public* constant. |
static String |
CACHE_FILES_SIZES
Warning: CACHE_FILES_SIZES is not a *public* constant. |
static String |
CACHE_FILES_TIMESTAMPS
Warning: CACHE_FILES_TIMESTAMPS is not a *public* constant. |
static String |
CACHE_LOCALARCHIVES
Warning: CACHE_LOCALARCHIVES is not a *public* constant. |
static String |
CACHE_LOCALFILES
Warning: CACHE_LOCALFILES is not a *public* constant. |
static String |
CACHE_SYMLINK
Warning: CACHE_SYMLINK is not a *public* constant. |
Constructor Summary | |
---|---|
DistributedCache()
|
Method Summary | |
---|---|
static void |
addArchiveToClassPath(Path archive,
Configuration conf)
Deprecated. Please use addArchiveToClassPath(Path, Configuration, FileSystem)
instead. The FileSystem should be obtained within an
appropriate doAs . |
static void |
addArchiveToClassPath(Path archive,
Configuration conf,
FileSystem fs)
Add an archive path to the current set of classpath entries. |
static void |
addCacheArchive(URI uri,
Configuration conf)
Add a archives to be localized to the conf. |
static void |
addCacheFile(URI uri,
Configuration conf)
Add a file to be localized to the conf. |
static void |
addFileToClassPath(Path file,
Configuration conf)
Deprecated. Please use addFileToClassPath(Path, Configuration, FileSystem)
instead. The FileSystem should be obtained within an
appropriate doAs . |
static void |
addFileToClassPath(Path file,
Configuration conf,
FileSystem fs)
Add a file path to the current set of classpath entries. |
static void |
addLocalArchives(Configuration conf,
String str)
Add a archive that has been localized to the conf. |
static void |
addLocalFiles(Configuration conf,
String str)
Add a file that has been localized to the conf.. |
static boolean |
checkURIs(URI[] uriFiles,
URI[] uriArchives)
This method checks if there is a conflict in the fragment names of the uris. |
static void |
createAllSymlink(Configuration conf,
File jobCacheDir,
File workDir)
Deprecated. Internal to MapReduce framework. Use DistributedCacheManager instead. |
static void |
createSymlink(Configuration conf)
This method allows you to create symlinks in the current working directory of the task to all the cache files/archives. |
static Path[] |
getArchiveClassPaths(Configuration conf)
Get the archive entries in classpath as an array of Path. |
static long[] |
getArchiveTimestamps(Configuration conf)
Get the timestamps of the archives. |
static URI[] |
getCacheArchives(Configuration conf)
Get cache archives set in the Configuration. |
static URI[] |
getCacheFiles(Configuration conf)
Get cache files set in the Configuration. |
static Path[] |
getFileClassPaths(Configuration conf)
Get the file entries in classpath as an array of Path. |
static FileStatus |
getFileStatus(Configuration conf,
URI cache)
Returns FileStatus of a given cache file on hdfs. |
static long[] |
getFileTimestamps(Configuration conf)
Get the timestamps of the files. |
static Path[] |
getLocalCacheArchives(Configuration conf)
Return the path array of the localized caches. |
static Path[] |
getLocalCacheFiles(Configuration conf)
Return the path array of the localized files. |
static boolean |
getSymlink(Configuration conf)
This method checks to see if symlinks are to be create for the localized cache files in the current working directory Used by internal DistributedCache code. |
static long |
getTimestamp(Configuration conf,
URI cache)
Returns mtime of a given cache file on hdfs. |
static void |
setArchiveTimestamps(Configuration conf,
String timestamps)
This is to check the timestamp of the archives to be localized. |
static void |
setCacheArchives(URI[] archives,
Configuration conf)
Set the configuration with the given set of archives. |
static void |
setCacheFiles(URI[] files,
Configuration conf)
Set the configuration with the given set of files. |
static void |
setFileTimestamps(Configuration conf,
String timestamps)
This is to check the timestamp of the files to be localized. |
static void |
setLocalArchives(Configuration conf,
String str)
Set the conf to contain the location for localized archives. |
static void |
setLocalFiles(Configuration conf,
String str)
Set the conf to contain the location for localized files. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
CACHE_FILES_SIZES
public static final String CACHE_FILES_SIZES
- Warning:
CACHE_FILES_SIZES
is not a *public* constant.- See Also:
- Constant Field Values
CACHE_ARCHIVES_SIZES
public static final String CACHE_ARCHIVES_SIZES
- Warning:
CACHE_ARCHIVES_SIZES
is not a *public* constant.- See Also:
- Constant Field Values
CACHE_ARCHIVES_TIMESTAMPS
public static final String CACHE_ARCHIVES_TIMESTAMPS
- Warning:
CACHE_ARCHIVES_TIMESTAMPS
is not a *public* constant.- See Also:
- Constant Field Values
CACHE_FILES_TIMESTAMPS
public static final String CACHE_FILES_TIMESTAMPS
- Warning:
CACHE_FILES_TIMESTAMPS
is not a *public* constant.- See Also:
- Constant Field Values
CACHE_ARCHIVES
public static final String CACHE_ARCHIVES
- Warning:
CACHE_ARCHIVES
is not a *public* constant.- See Also:
- Constant Field Values
CACHE_FILES
public static final String CACHE_FILES
- Warning:
CACHE_FILES
is not a *public* constant.- See Also:
- Constant Field Values
CACHE_LOCALARCHIVES
public static final String CACHE_LOCALARCHIVES
- Warning:
CACHE_LOCALARCHIVES
is not a *public* constant.- See Also:
- Constant Field Values
CACHE_LOCALFILES
public static final String CACHE_LOCALFILES
- Warning:
CACHE_LOCALFILES
is not a *public* constant.- See Also:
- Constant Field Values
CACHE_SYMLINK
public static final String CACHE_SYMLINK
- Warning:
CACHE_SYMLINK
is not a *public* constant.- See Also:
- Constant Field Values
Constructor Detail |
---|
DistributedCache
public DistributedCache()
Method Detail |
---|
getFileStatus
public static FileStatus getFileStatus(Configuration conf, URI cache) throws IOException
- Returns
FileStatus
of a given cache file on hdfs. Internal to MapReduce.- Parameters:
conf
- configurationcache
- cache file- Returns:
FileStatus
of a given cache file on hdfs- Throws:
IOException
getTimestamp
public static long getTimestamp(Configuration conf, URI cache) throws IOException
- Returns mtime of a given cache file on hdfs. Internal to MapReduce.
- Parameters:
conf
- configurationcache
- cache file- Returns:
- mtime of a given cache file on hdfs
- Throws:
IOException
createAllSymlink
public static void createAllSymlink(Configuration conf, File jobCacheDir, File workDir) throws IOException
- Deprecated. Internal to MapReduce framework. Use DistributedCacheManager
instead.
- This method create symlinks for all files in a given dir in another directory
- Parameters:
conf
- the configurationjobCacheDir
- the target directory for creating symlinksworkDir
- the directory in which the symlinks are created- Throws:
IOException
- This method create symlinks for all files in a given dir in another directory
setCacheArchives
public static void setCacheArchives(URI[] archives, Configuration conf)
- Set the configuration with the given set of archives. Intended
to be used by user code.
- Parameters:
archives
- The list of archives that need to be localizedconf
- Configuration which will be changed
setCacheFiles
public static void setCacheFiles(URI[] files, Configuration conf)
- Set the configuration with the given set of files. Intended to be
used by user code.
- Parameters:
files
- The list of files that need to be localizedconf
- Configuration which will be changed
getCacheArchives
public static URI[] getCacheArchives(Configuration conf) throws IOException
- Get cache archives set in the Configuration. Used by
internal DistributedCache and MapReduce code.
- Parameters:
conf
- The configuration which contains the archives- Returns:
- An array of the caches set in the Configuration
- Throws:
IOException
getCacheFiles
public static URI[] getCacheFiles(Configuration conf) throws IOException
- Get cache files set in the Configuration. Used by internal
DistributedCache and MapReduce code.
- Parameters:
conf
- The configuration which contains the files- Returns:
- Am array of the files set in the Configuration
- Throws:
IOException
getLocalCacheArchives
public static Path[] getLocalCacheArchives(Configuration conf) throws IOException
- Return the path array of the localized caches. Intended to be used
by user code.
- Parameters:
conf
- Configuration that contains the localized archives- Returns:
- A path array of localized caches
- Throws:
IOException
getLocalCacheFiles
public static Path[] getLocalCacheFiles(Configuration conf) throws IOException
- Return the path array of the localized files. Intended to be used
by user code.
- Parameters:
conf
- Configuration that contains the localized files- Returns:
- A path array of localized files
- Throws:
IOException
getArchiveTimestamps
public static long[] getArchiveTimestamps(Configuration conf)
- Get the timestamps of the archives. Used by internal
DistributedCache and MapReduce code.
- Parameters:
conf
- The configuration which stored the timestamps- Returns:
- a long array of timestamps
- Throws:
IOException
getFileTimestamps
public static long[] getFileTimestamps(Configuration conf)
- Get the timestamps of the files. Used by internal
DistributedCache and MapReduce code.
- Parameters:
conf
- The configuration which stored the timestamps- Returns:
- a long array of timestamps
- Throws:
IOException
setArchiveTimestamps
public static void setArchiveTimestamps(Configuration conf, String timestamps)
- This is to check the timestamp of the archives to be localized.
Used by internal MapReduce code.
- Parameters:
conf
- Configuration which stores the timestamp'stimestamps
- comma separated list of timestamps of archives. The order should be the same as the order in which the archives are added.
setFileTimestamps
public static void setFileTimestamps(Configuration conf, String timestamps)
- This is to check the timestamp of the files to be localized.
Used by internal MapReduce code.
- Parameters:
conf
- Configuration which stores the timestamp'stimestamps
- comma separated list of timestamps of files. The order should be the same as the order in which the files are added.
setLocalArchives
public static void setLocalArchives(Configuration conf, String str)
- Set the conf to contain the location for localized archives. Used
by internal DistributedCache code.
- Parameters:
conf
- The conf to modify to contain the localized cachesstr
- a comma separated list of local archives
setLocalFiles
public static void setLocalFiles(Configuration conf, String str)
- Set the conf to contain the location for localized files. Used
by internal DistributedCache code.
- Parameters:
conf
- The conf to modify to contain the localized cachesstr
- a comma separated list of local files
addLocalArchives
public static void addLocalArchives(Configuration conf, String str)
- Add a archive that has been localized to the conf. Used
by internal DistributedCache code.
- Parameters:
conf
- The conf to modify to contain the localized cachesstr
- a comma separated list of local archives
addLocalFiles
public static void addLocalFiles(Configuration conf, String str)
- Add a file that has been localized to the conf.. Used
by internal DistributedCache code.
- Parameters:
conf
- The conf to modify to contain the localized cachesstr
- a comma separated list of local files
addCacheArchive
public static void addCacheArchive(URI uri, Configuration conf)
- Add a archives to be localized to the conf. Intended to
be used by user code.
- Parameters:
uri
- The uri of the cache to be localizedconf
- Configuration to add the cache to
addCacheFile
public static void addCacheFile(URI uri, Configuration conf)
- Add a file to be localized to the conf. Intended
to be used by user code.
- Parameters:
uri
- The uri of the cache to be localizedconf
- Configuration to add the cache to
addFileToClassPath
@Deprecated public static void addFileToClassPath(Path file, Configuration conf) throws IOException
- Deprecated. Please use
addFileToClassPath(Path, Configuration, FileSystem)
instead. TheFileSystem
should be obtained within an appropriatedoAs
.- Add a file path to the current set of classpath entries. It adds the file to cache as well. Intended to be used by user code.
- Parameters:
file
- Path of the file to be addedconf
- Configuration that contains the classpath setting- Throws:
IOException
- Add a file path to the current set of classpath entries. It adds the file to cache as well. Intended to be used by user code.
addFileToClassPath
public static void addFileToClassPath(Path file, Configuration conf, FileSystem fs) throws IOException
- Add a file path to the current set of classpath entries. It adds the file
to cache as well. Intended to be used by user code.
- Parameters:
file
- Path of the file to be addedconf
- Configuration that contains the classpath settingfs
- FileSystem with respect to whicharchivefile
should be interpreted.- Throws:
IOException
getFileClassPaths
public static Path[] getFileClassPaths(Configuration conf)
- Get the file entries in classpath as an array of Path.
Used by internal DistributedCache code.
- Parameters:
conf
- Configuration that contains the classpath setting
addArchiveToClassPath
@Deprecated public static void addArchiveToClassPath(Path archive, Configuration conf) throws IOException
- Deprecated. Please use
addArchiveToClassPath(Path, Configuration, FileSystem)
instead. TheFileSystem
should be obtained within an appropriatedoAs
.- Add an archive path to the current set of classpath entries. It adds the archive to cache as well. Intended to be used by user code.
- Parameters:
archive
- Path of the archive to be addedconf
- Configuration that contains the classpath setting- Throws:
IOException
- Add an archive path to the current set of classpath entries. It adds the archive to cache as well. Intended to be used by user code.
addArchiveToClassPath
public static void addArchiveToClassPath(Path archive, Configuration conf, FileSystem fs) throws IOException
- Add an archive path to the current set of classpath entries. It adds the
archive to cache as well. Intended to be used by user code.
- Parameters:
archive
- Path of the archive to be addedconf
- Configuration that contains the classpath settingfs
- FileSystem with respect to whicharchive
should be interpreted.- Throws:
IOException
getArchiveClassPaths
public static Path[] getArchiveClassPaths(Configuration conf)
- Get the archive entries in classpath as an array of Path.
Used by internal DistributedCache code.
- Parameters:
conf
- Configuration that contains the classpath setting
createSymlink
public static void createSymlink(Configuration conf)
- This method allows you to create symlinks in the current working directory
of the task to all the cache files/archives.
Intended to be used by user code.
- Parameters:
conf
- the jobconf
getSymlink
public static boolean getSymlink(Configuration conf)
- This method checks to see if symlinks are to be create for the
localized cache files in the current working directory
Used by internal DistributedCache code.
- Parameters:
conf
- the jobconf- Returns:
- true if symlinks are to be created- else return false
checkURIs
public static boolean checkURIs(URI[] uriFiles, URI[] uriArchives)
- This method checks if there is a conflict in the fragment names
of the uris. Also makes sure that each uri has a fragment. It
is only to be called if you want to create symlinks for
the various archives and files. May be used by user code.
- Parameters:
uriFiles
- The uri array of urifilesuriArchives
- the uri array of uri archives
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Copyright © 2009 The Apache Software Foundation