Recent face recognition experiments on a major benchmark (LFW [14]) show stunning performance-a number of algorithms achieve near to perfect score, surpassing human recognition rates. In this paper, we advocate evaluations at the million scale (LFW includes only 13K photos of 5K people). To this end, we have assembled the MegaFace dataset and created the first MegaFace challenge. Our dataset includes One Million photos that capture more than 690K different individuals. The challenge evaluates performance of algorithms with increasing numbers of "distractors" (going from 10 to 1M) in the gallery set. We present both identification and verification performance, evaluate performance with respect to pose and a persons age, and compare as a function of training data size (#photos and #people). We report results of state of the art and baseline algorithms. The MegaFace dataset, baseline code, and evaluation scripts, are all publicly released for further experimentations 1 .
Text classification (TC) is the task of automatically assigning documents to a fixed number of categories. TC is an important component in many text applications. Many of these applications perform preprocessing. There are different types of text preprocessing, e.g., conversion of uppercase letters into lowercase letters, HTML tag removal, stopword removal, punctuation mark removal, lemmatization, correction of common misspelled words, and reduction of replicated characters. We hypothesize that the application of different combinations of preprocessing methods can improve TC results. Therefore, we performed an extensive and systematic set of TC experiments (and this is our main research contribution) to explore the impact of all possible combinations of five/ six basic preprocessing methods on four benchmark text corpora (and not samples of them) using three ML methods and training and test sets. The general conclusion (at least for the datasets verified) is that it is always advisable to perform an extensive and systematic variety of preprocessing methods combined with TC experiments because it contributes to improve TC accuracy. For all the tested datasets, there was always at least one combination of basic preprocessing methods that could be recommended to significantly improve the TC using a BOW representation. For three datasets, stopword removal was the only single preprocessing method that enabled a significant improvement compared to the baseline result using a bag of 1,000-word unigrams. For some of the datasets, there was minimal improvement when we removed HTML tags, performed spelling correction or removed punctuation marks, and reduced replicated characters. However, for the fourth dataset, the stopword removal was not beneficial. Instead, the conversion of uppercase letters into lowercase letters was the only single preprocessing method that demonstrated a significant improvement compared to the baseline result. The best result for this dataset was obtained when we performed spelling correction and conversion into lowercase letters. In general, for all the datasets processed, there was always at least one combination of basic preprocessing methods that could be recommended to improve the accuracy results when using a bag-of-words representation.
Both human activity and climate change can influence erosion rates and initiate rapid landscape change. Understanding the relative impact of these factors is critical to managing the risks of extreme erosion related to flooding and landslide occurrence. Here we present a 2100 year record of sediment mass accumulation and inferred erosion based on lacustrine sediment cores from Amherst Lake, Vermont, USA. Using deposition from August 2011 Tropical Storm Irene as a modern analogue, we identified distinct event deposits indicative of destructive erosion events. These deposits record a prolonged (multidecadal) interval of enhanced erosion following the initial storm‐induced landscape disturbance. The direct impact of human land cover alteration is minimal in comparison to the more recent twentieth century increase in the occurrence of catastrophic erosion linked to overall wetter conditions that favor high erosion rates and more easily trigger landslides during periods of extreme precipitation.
Abstract. Paleotemperature reconstructions are essential for distinguishing anthropogenic climate change from natural variability. An emerging method in paleolimnology is the use of branched glycerol dialkyl glycerol tetraethers (brGDGTs) in sediments to reconstruct temperature, but their application is hindered by a limited understanding of their sources, seasonal production, and transport. Here, we report seasonally resolved measurements of brGDGT production in the water column, in catchment soils, and in a sediment core from Basin Pond, a small, deep inland lake in Maine, USA. We find similar brGDGT distributions in both water column and lake sediment samples but the catchment soils have distinct brGDGT distributions suggesting that (1) brGDGTs are produced within the lake and (2) this in situ production dominates the down-core sedimentary signal. Seasonally, depth-resolved measurements indicate that most brGDGT production occurs in late fall, and at intermediate depths (18–30 m) in the water column. We utilize these observations to help interpret a Basin Pond brGDGT-based temperature reconstruction spanning the past 900 years. This record exhibits trends similar to a pollen record from the same site and also to regional and global syntheses of terrestrial temperatures over the last millennium. However, the Basin Pond temperature record shows higher-frequency variability than has previously been captured by such an archive in the northeastern United States, potentially attributed to the North Atlantic Oscillation and volcanic or solar activity. This first brGDGT-based multi-centennial paleoreconstruction from this region contributes to our understanding of the production and fate of brGDGTs in lacustrine systems.
Effective management of operating room resources relies on accurate predictions of surgical case durations. This prediction problem is known to be particularly difficult in pediatric hospitals due to the extreme variation in pediatric patient populations. We pursue two supervised learning approaches: (1) We directly predict the surgical case durations using features derived from electronic medical records and from hospital operational information. For this regression problem, we propose a novel metric for measuring accuracy of predictions which captures key issues relevant to hospital operations. We evaluate several prediction models; some are automated (they do not require input from surgeons) while others are semi-automated (they do require input from surgeons). We see that many of our automated (2) We consider a classification problem in which each prediction provided by a surgeon is predicted to be correct, an overestimate, or an underestimate. This classification mechanism builds on the metric mentioned above and could potentially be useful for detecting human errors. Both supervised learning approaches give insights into the feature engineering process while creating the basis for decision support tools.
We have developed learning and interaction algorithms to support a human teaching hierarchical task models to a robot using a single demonstration in the context of a mixedinitiative interaction with bi-directional communication. In particular, we have identified and implemented two important heuristics for suggesting task groupings based on the physical structure of the manipulated artifact and on the data flow between tasks. We have evaluated our algorithms with users in a simulated environment and shown both that the overall approach is usable and that the grouping suggestions significantly improve the learning and interaction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.