This work introduces a set of scalable algorithms to identify patterns of human daily behaviors. These patterns are extracted from multivariate temporal data that have been collected from smartphones. We have exploited sensors that are available on these devices, and have identified frequent behavioral patterns with a temporal granularity, which has been inspired by the way individuals segment time into events. These patterns are helpful to both end-users and third parties who provide services based on this information. We have demonstrated our approach on two real-world datasets and showed that our pattern identification algorithms are scalable. This scalability makes analysis on resource constrained and small devices such as smartwatches feasible. Traditional data analysis systems are usually operated in a remote system outside the device. This is largely due to the lack of scalability originating from software and hardware restrictions of mobile/wearable devices. By analyzing the data on the device, the user has the control over the data, i.e. privacy, and the network costs will also be removed.
The goal of this work is to systematically extract information from hacker forums, whose information would be in general described as unstructured: the text of a post is not necessarily following any writing rules. By contrast, many security initiatives and commercial entities are harnessing the readily public information, but they seem to focus on structured sources of information. Here, we focus on the problem of analyzing text content in security forums. A key novelty is that we use user profiles and contextual features along with transfer learning approach and also embedding space to help us identify and refine information that we could not get from security forum with trivial analysis. We collect a wealth of data from 5 different security forums. The contribution of our work is twofold; (a) we develop a method to automatically identify through the forums malicious IP addresses (b) we also propose a systematic method to identify and classify user-specified threads of interest into four categories. We further showcase how this information can inform knowledge extraction from the forums. As the cyberwars are becoming more intense, having early accesses to useful information becomes more imperative to remove the hackers firstmove advantage, and our work is a solid step towards this direction.
Stellar feedback in dwarf galaxies plays a critical role in regulating star formation via galaxy-scale winds. Recent hydrodynamical zoom-in simulations of dwarf galaxies predict that the periodic outward flow of gas can change the gravitational potential sufficiently to cause radial migration of stars. To test the effect of bursty star formation on stellar migration, we examine star formation observables and sizes of 86 local dwarf galaxies. We find a correlation between the R-band half-light radius (R
e
) and far-UV luminosity (L
FUV) for stellar masses below 108
M
⊙ and a weak correlation between the R
e
and Hα luminosity (L
Hα
). We produce mock observations of eight low-mass galaxies from the FIRE-2 cosmological simulations and measure the similarity of the time sequences of R
e
and a number of star formation indicators with different timescales. Major episodes of R
e
time sequence align very well with the major episodes of star formation, with a delay of ∼50 Myr. This correlation decreases toward star formation rate indicators of shorter timescales such that R
e
is weakly correlated with L
FUV (10–100 Myr timescale) and is completely uncorrelated with L
Hα
(a few Myr timescale), in agreement with the observations. Our findings based on FIRE-2 suggest that the R-band size of a galaxy reacts to star formation variations on a ∼50 Myr timescale. With the advent of a new generation of large space telescopes (e.g., JWST), this effect can be examined explicitly in galaxies at higher redshifts, where bursty star formation is more prominent.
Is it possible to extract malicious IP addresses reported in security forums in an automatic way? This is the question at the heart of our work. We focus on security forums, where security professionals and hackers share knowledge and information, and often report misbehaving IP addresses. So far, there have only been a few efforts to extract information from such security forums. We propose RIPEx, a systematic approach to identify and label IP addresses in security forums by utilizing a cross-forum learning method. In more detail, the challenge is twofold: (a) identifying IP addresses from other numerical entities, such as software version numbers, and (b) classifying the IP address as benign or malicious. We propose an integrated solution that tackles both these problems. A novelty of our approach is that it does not require training data for each new forum. Our approach does knowledge transfer across forums: we use a classifier from our source forums to identify seed information for training a classifier on the target forum. We evaluate our method using data collected from five security forums with a total of 31K users and 542K posts. First, RIPEx can distinguish IP address from other numeric expressions with 95% precision and above 93% recall on average. Second, RIPEx identifies malicious IP addresses with an average precision of 88% and over 78% recall, using our cross-forum learning. Our work is a first step towards harnessing the wealth of useful information that can be found in security forums.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.